Hi Li, Thanks for the explanation! I’ll keep the code as is for now (and an eye on ARROW-3283).
As you pointed out, I’ll need another solution for streaming the table over a socket anyway. To clarify, my code does read the actual data in a second pass. However, doing so without knowing how many rows to expect is very expensive. Thanks again, Michael > On 21. Sep 2018, at 16:31, Li Jin <ice.xell...@gmail.com> wrote: > > Hi Michael, > > I think ArrowFileReader takes SeekableByteChannel so it's possible to only > read the metadata for each record batches and skip the data. However it is > not implemented. > > If the input Channel is not seekable (for example, a socket channel) then > you would need to read the body for each record batches to get the next > batch, so my hunch is that the performance will be similar whether you read > record batch body into VectorSchemaRoot or just read the bytes. > > If you don't assume your input data is always going to be seekable, I am > not sure there is a quicker way to do this. > > > > On Fri, Sep 21, 2018 at 9:33 AM Michael Knopf <mkn...@rapidminer.com> wrote: > >> Hi all, >> >> I am looking for a quick way to look up the total row count of a data set >> stored in Arrow’s random access file format using the Java API. Basically, >> a quicker way to do this: >> >> // The reader is in an instance of ArrowFileReader >> List<ArrowBlock> blocks = reader.getRecordBlocks(); >> int nRows = 0; >> for (ArrowBlock block : blocks) { >> reader.loadRecordBatch(block); >> nRows += root.getRowCount(); >> } >> >> My understanding is that the above snippets loads the entire data set >> instead of just the block headers. >> >> To give you some context, I am looking into using Arrow for IPC between a >> JVM and a Python interpreter using a custom data format and PyArrow/Pandas >> respectively. While the streaming API might be a better tool for this job, >> I started out with using files to keep things simple. >> >> Any help would be greatly appreciated – maybe I just missed the right bit >> of documentation. >> >> Thanks, >> Michael