Hello, I could take a shot at the Java one if you like?
I'm actually working in the codebase at the moment on something related that I was going to offer as a PR once it's ready. We use the Java Arrow library as the core of our data service, the VSR is our intermediate representation and we translate to/from various formats and across various storage backends. We really need non-blocking data read to make that efficient and scalable, so I've made alternate implementations of the Readers where you can feed in data as a series of ByteBuffer objects instead of calling loadNextBatch(). For streams this means feeding in bytes and buffering until a batch is available, for files we're reading the block info from the footer and then feeding in buffers (slices) for each block. I was able to reuse all the same serialization helpers etc. Does this sound useful? If it does then I can raise a PR for Arrow when it's done. No worries if not and we just keep the non-blocking readers in our own codebase. They're not a lot of code either way. Happy to take a shot at the row counts after that, weekend time probably. If I sketched out a draft PR would you be happy to take a look and tell me if I'm on the right lines? Kind regards, Martin Traverse Technical Architect UKI Risk Tel: +44 7305 120 791 Email: martin.trave...@accenture.com My regular office hours are 10:00 - 18:30 UK time, Monday - Thursday -----Original Message----- From: Weston Pace <weston.p...@gmail.com> Sent: 28 March 2023 17:35 To: dev@arrow.apache.org Subject: [External] Re: row counts in footer of IPC file format This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links and attachments. I suspect the next step will be to create two implementations and create test files for the integration test suite. These will be required before we can vote on this. Are either of you interested in contributing an implementation (C++, Rust, Java, and Go have been the usual suspects in the past but JS or C# should be viable too)? In the past, once an implementation & test files have been created for one language, it has been easier to drum up a volunteer to create a second implementation. ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy. ______________________________________________________________________________________ www.accenture.com