Fenil-v commented on issue #907: URL: https://github.com/apache/arrow-java/issues/907#issuecomment-3517677803
> > ### Describe the usage question you have. Please include as many useful details as possible. > > I have ~ 20KB objects that I need to write to Parquet efficiently from Java. In C++, C#, and Python there's a direct/bulk Arrow-Parquet write (e.g. WriteTable / write_table) that avoids row-by-row iteration, but in Java I only see row-by-row paths via RecordConsumer or internal/unstable column writers. Questions: > > > > 1. Is there a supported bulk/columnar Arrow-Parquet write API in Java (e.g, VectorSchemaRoot > > → Parquet) that avoids row-by-row calls? > > 2. If not, why is Java limited to row-by-row writes today? Any roadmap for feature parity with C++/Python/C#? > > 3. For now, what's the recommended optimization path to write 20KB objects at high throughput from Java (without JNI), or is JNI/Dataset the recommended route? > > 4. Any best practices (batch sizing, encodings, writer settings) to mitigate the row-by-row overhead? > > > > ### Component(s) > > Java > > > ### Describe the usage question you have. Please include as many useful details as possible. > > I have ~ 20KB objects that I need to write to Parquet efficiently from Java. In C++, C#, and Python there's a direct/bulk Arrow-Parquet write (e.g. WriteTable / write_table) that avoids row-by-row iteration, but in Java I only see row-by-row paths via RecordConsumer or internal/unstable column writers. Questions: > > > > 1. Is there a supported bulk/columnar Arrow-Parquet write API in Java (e.g, VectorSchemaRoot > > → Parquet) that avoids row-by-row calls? > > 2. If not, why is Java limited to row-by-row writes today? Any roadmap for feature parity with C++/Python/C#? > > 3. For now, what's the recommended optimization path to write 20KB objects at high throughput from Java (without JNI), or is JNI/Dataset the recommended route? > > 4. Any best practices (batch sizing, encodings, writer settings) to mitigate the row-by-row overhead? > > > > ### Component(s) > > Java > > [@pitrou](https://github.com/pitrou) any idea on this? Any help would be saver for me @julienledem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
