jiacai2050 opened a new issue, #2916: URL: https://github.com/apache/arrow-rs/issues/2916
**Which part is this question about** API Usage & Perf **Describe your question** I create two benchmark based on [example code](https://docs.rs/parquet/latest/parquet/arrow/async_reader/index.html), and in my environment, this is what I got - ParquetRecordBatchReader cost 4s - ParquetRecordBatchStream cost 5s The tested data is: - total rows: 40935755 - row group: 4998 This is the schema of parquet file ``` message arrow_schema { required int64 tsid (INTEGER(64,false)); required int64 enddate (TIMESTAMP(MILLIS,false)); optional int64 id; optional int64 code; optional binary source (STRING); optional int64 innercode; optional int64 del; optional int64 jsid; optional int64 updatetime (TIMESTAMP(MILLIS,false)); optional double weight; } ``` **Additional context** I dig into Parquet's source code, and find they both call `build_array_reader` to read parquet file, so the difference may above this layer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
