hi Rob, This is going to be a bit wasteful because of the extra buffer padding, etc., but in any case: the thing you are missing is a function to concatenate arrays, which can be used to make a record batch concatenation function. A relevant JIRA is https://issues.apache.org/jira/browse/ARROW-549
- Wes On Wed, Oct 31, 2018 at 1:50 PM Ambalu, Robert <[email protected]> wrote: > > Hey, Im trying to figure out how to merge multiple recordbatches in order to > optimize overly-chunked tables. > A bit of background here... we have a process that is streaming table rows > with a batch size of 1 ( because we want to ensure updates are written out in > case of a crash ). We also have some code that reads this table on startup. > Our reading code has logic to access a specific row of a table, which this > startup code does. To access a specific row you need to iterate through all > chunks to find the right one. We're hitting a bottle neck on this specific > file since it has a chunk size of 1. Simplest solution for us would be to > merge all the chunked data into one chunk on startup when we read in the > arrow file. We've tried to find a way to do this using the arrow c++ library > / documents but cant seem to find a clean approach. > Is there any clean way to do this? Any other possible suggestions? > > Side note - we did notice theres some method called > "RechunkArraysConsistently" . We couldn't find much info on it, but if that > somehow ensures all chunks are of the same size and we can re-chunk the > columns, then row access would be a quick calc ( if all chunks are the same > size computing chunk / row in chunk is quick ) > > > Thanks > - Rob > > > > > > DISCLAIMER: This e-mail message and any attachments are intended solely for > the use of the individual or entity to which it is addressed and may contain > information that is confidential or legally privileged. If you are not the > intended recipient, you are hereby notified that any dissemination, > distribution, copying or other use of this message or its attachments is > strictly prohibited. If you have received this message in error, please > notify the sender immediately and permanently delete this message and any > attachments. > > >
