hi Rob,

This is going to be a bit wasteful because of the extra buffer
padding, etc., but in any case: the thing you are missing is a
function to concatenate arrays, which can be used to make a record
batch concatenation function. A relevant JIRA is
https://issues.apache.org/jira/browse/ARROW-549

- Wes
On Wed, Oct 31, 2018 at 1:50 PM Ambalu, Robert
<[email protected]> wrote:
>
> Hey, Im trying to figure out how to merge multiple recordbatches in order to 
> optimize overly-chunked tables.
> A bit of background here... we have a process that is streaming table rows 
> with a batch size of 1 ( because we want to ensure updates are written out in 
> case of a crash ).  We also have some code that reads this table on startup.
> Our reading code has logic to access a specific row of a table, which this 
> startup code does.  To access a specific row you need to iterate through all 
> chunks to find the right one.  We're hitting a bottle neck on this specific 
> file since it has a chunk size of 1.  Simplest solution for us would be to 
> merge all the chunked data into one chunk on startup when we read in the 
> arrow file.  We've tried to find a way to do this using the arrow c++ library 
> / documents but cant seem to find a clean approach.
> Is there any clean way to do this?  Any other possible suggestions?
>
> Side note - we did notice theres some method called 
> "RechunkArraysConsistently" .  We couldn't find much info on it, but if that 
> somehow ensures all chunks are of the same size and we can re-chunk the 
> columns, then row access would be a quick calc ( if all chunks are the same 
> size computing chunk / row in chunk is quick )
>
>
> Thanks
> - Rob
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for 
> the use of the individual or entity to which it is addressed and may contain 
> information that is confidential or legally privileged. If you are not the 
> intended recipient, you are hereby notified that any dissemination, 
> distribution, copying or other use of this message or its attachments is 
> strictly prohibited. If you have received this message in error, please 
> notify the sender immediately and permanently delete this message and any 
> attachments.
>
>
>

Reply via email to