[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788448#comment-17788448 ]
ASF GitHub Bot commented on PARQUET-2374: ----------------------------------------- wgtmac commented on PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1821108258 > For the object stores, things to measure are > > * time to open() and close() a file > * time for a read after a backwards seek > * time for a read after a forwards seek. > * how many reads actually took place > * for vector IO, whatever gets picked up there > * were errors reported and retried, or throttling events > * number of underlying GET requests > CMIW, it seems that these stats can be collected solely at the input stream level. > Add metrics support for parquet file reader > ------------------------------------------- > > Key: PARQUET-2374 > URL: https://issues.apache.org/jira/browse/PARQUET-2374 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.13.1 > Reporter: Parth Chandra > Priority: Major > > ParquetFileReader is used by many engines - Hadoop, Spark among them. These > engines report various metrics to measure performance in different > environments and it is usually useful to be able to get low level metrics out > of the file reader and writers. > It would be very useful to allow a simple interface to report the metrics. > Callers can then implement the interface to record the metrics in any > subsystem they choose. -- This message was sent by Atlassian Jira (v8.20.10#820010)