andygrove opened a new issue, #2146: URL: https://github.com/apache/datafusion-comet/issues/2146
### What is the problem the feature request solves? Recommendations from Claude Code: ### 1. Immediate Actions 1. **Add Explicit Synchronization**: Replace timing-based batch cleanup with explicit synchronization mechanisms 2. **Implement Proper Lifecycle Tracking**: Add reference counting or explicit lifetime management for shared buffers 3. **Add Validation**: Implement runtime checks for pointer validity before FFI operations ### 2. Medium-term Improvements 1. **Memory Pool Integration**: Better integrate with Arrow's memory pools to track FFI transfers 2. **Error Recovery**: Add robust error handling for FFI failures and partial cleanup 3. **Testing**: Add stress tests specifically for concurrent access patterns ### 3. Long-term Considerations 1. **Alternative FFI Mechanisms**: Consider newer Arrow FFI mechanisms that provide better lifetime guarantees 2. **Zero-Copy Optimizations**: Investigate ways to reduce copying in the non-zero offset case 3. **Monitoring**: Add metrics to track FFI-related memory usage and potential issues ## Testing Considerations To properly test Arrow FFI memory safety: 1. **Stress Testing**: Run concurrent operations with memory pressure 2. **Valgrind/AddressSanitizer**: Use memory debugging tools on native code 3. **JVM Memory Profiling**: Monitor for memory leaks using JVM profilers 4. **Error Injection**: Test error handling during FFI operations 5. **Platform Testing**: Verify behavior on different architectures and alignment requirements ## Conclusion While Comet's Arrow FFI implementation is generally well-architected, there are several areas where memory safety could be improved. The most significant risk is the potential for use-after-free conditions in concurrent scenarios. The codebase shows awareness of these issues through extensive comments and defensive programming, but additional synchronization mechanisms would provide stronger guarantees. The false positive memory leak detection issue, while not a safety risk per se, could mask real problems and should be addressed through better integration with Arrow's memory management systems. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org