Hi all,
I am looking for a quick way to look up the total row count of a data set
stored in Arrow’s random access file format using the Java API. Basically, a
quicker way to do this:
// The reader is in an instance of ArrowFileReader
List<ArrowBlock> blocks = reader.getRecordBlocks();
int nRows = 0;
for (ArrowBlock block : blocks) {
reader.loadRecordBatch(block);
nRows += root.getRowCount();
}
My understanding is that the above snippets loads the entire data set instead
of just the block headers.
To give you some context, I am looking into using Arrow for IPC between a JVM
and a Python interpreter using a custom data format and PyArrow/Pandas
respectively. While the streaming API might be a better tool for this job, I
started out with using files to keep things simple.
Any help would be greatly appreciated – maybe I just missed the right bit of
documentation.
Thanks,
Michael