Thanks for clarification on the usage Julien. I think for my testing it's best if I ensure only one ArrowRecordBatch is created, which will be easier to validate against json data. Thanks!
On Wed, Feb 8, 2017 at 6:05 PM, Julien Le Dem <[email protected]> wrote: > Yes, each RecordBatch overwrite the previous. > The idea is: > - you load a batch > - you process it (probably writing the output to other vectors) > - you load the next one in the same vectors. > So you iterate on the data, one RecordBatch at a time, limiting the amount > of memory you use. > The order of batches should stay the same. > The validator should deal with multiple record batches. > You can probably just read the same number of rows as the row batch at a > time from your json and compare? > > On Wed, Feb 8, 2017 at 3:47 PM, Bryan Cutler <[email protected]> wrote: > > > Hi All, > > > > I'm currently working on SPARK-13534 and trying to validate converted > data > > for testing purposes. The data can be broken up into multiple > > ArrowRecordBatches that each have a number of rows (same columns) and I > > need to concat these, and compare with a JSON file by calling > > Validator.compareVectorSchemaRoot. On repeated calls to > > VectorLoader.load, > > each record batch seems to overwrite the previous, but maybe I'm missing > > something. Is this possible to do on the Java side of Arrow? It could > > happen that the order of batches gets mixed up, so maybe this is not a > good > > way to validate anyway. > > > > Thanks, > > Bryan > > > > > > -- > Julien >
