davidzollo commented on PR #9894: URL: https://github.com/apache/seatunnel/pull/9894#issuecomment-3699333491
1. The current implementation in `LanceSinkWriter.write()` creates a new `RootAllocator`, opens the Dataset, and commits a transaction for **every single row**. This is an anti-pattern for batch processing and will result in extremely low throughput and the creation of thousands of tiny file fragments. 2. The `BufferAllocator` and `Dataset` connections should be initialized once at the class level and reused throughout the writer's lifecycle, rather than being created and closed for every record. How do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
