Re: [Java] VectorLoader for multiple ArrowRecordBatches

Julien Le Dem Wed, 08 Feb 2017 18:06:32 -0800

Yes, each RecordBatch overwrite the previous.
The idea is:
 - you load a batch
 - you process it (probably writing the output to other vectors)
 - you load the next one in the same vectors.
So you iterate on the data, one RecordBatch at a time, limiting the amount
of memory you use.
The order of batches should stay the same.
The validator should deal with multiple record batches.
You can probably just read the same number of rows as the row batch at a
time from your json and compare?


On Wed, Feb 8, 2017 at 3:47 PM, Bryan Cutler <[email protected]> wrote:

> Hi All,
>
> I'm currently working on SPARK-13534 and trying to validate converted data
> for testing purposes.  The data can be broken up into multiple
> ArrowRecordBatches that each have a number of rows (same columns) and I
> need to concat these, and compare with a JSON file by calling
> Validator.compareVectorSchemaRoot.  On repeated calls to
> VectorLoader.load,
> each record batch seems to overwrite the previous, but maybe I'm missing
> something.  Is this possible to do on the Java side of Arrow?  It could
> happen that the order of batches gets mixed up, so maybe this is not a good
> way to validate anyway.
>
> Thanks,
> Bryan
>



-- 
Julien

Re: [Java] VectorLoader for multiple ArrowRecordBatches

Reply via email to