Re: [Java] VectorLoader for multiple ArrowRecordBatches

Bryan Cutler Thu, 09 Feb 2017 14:09:47 -0800

Thanks for clarification on the usage Julien.  I think for my testing it's
best if I ensure only one ArrowRecordBatch is created, which will be easier
to validate against json data.  Thanks!


On Wed, Feb 8, 2017 at 6:05 PM, Julien Le Dem <[email protected]> wrote:

> Yes, each RecordBatch overwrite the previous.
> The idea is:
>  - you load a batch
>  - you process it (probably writing the output to other vectors)
>  - you load the next one in the same vectors.
> So you iterate on the data, one RecordBatch at a time, limiting the amount
> of memory you use.
> The order of batches should stay the same.
> The validator should deal with multiple record batches.
> You can probably just read the same number of rows as the row batch at a
> time from your json and compare?
>
> On Wed, Feb 8, 2017 at 3:47 PM, Bryan Cutler <[email protected]> wrote:
>
> > Hi All,
> >
> > I'm currently working on SPARK-13534 and trying to validate converted
> data
> > for testing purposes.  The data can be broken up into multiple
> > ArrowRecordBatches that each have a number of rows (same columns) and I
> > need to concat these, and compare with a JSON file by calling
> > Validator.compareVectorSchemaRoot.  On repeated calls to
> > VectorLoader.load,
> > each record batch seems to overwrite the previous, but maybe I'm missing
> > something.  Is this possible to do on the Java side of Arrow?  It could
> > happen that the order of batches gets mixed up, so maybe this is not a
> good
> > way to validate anyway.
> >
> > Thanks,
> > Bryan
> >
>
>
>
> --
> Julien
>

Re: [Java] VectorLoader for multiple ArrowRecordBatches

Reply via email to