[ https://issues.apache.org/jira/browse/PARQUET-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785951#comment-17785951 ]
ASF GitHub Bot commented on PARQUET-2374: ----------------------------------------- wgtmac commented on code in PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#discussion_r1392855060 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java: ########## @@ -199,7 +206,7 @@ public DataPage visit(DataPageV1 dataPageV1) { @Override public DataPage visit(DataPageV2 dataPageV2) { - if (!dataPageV2.isCompressed() && offsetIndex == null && null == blockDecryptor) { + if (!dataPageV2.isCompressed() && offsetIndex == null && null == blockDecryptor) { Review Comment: Could you please revert this unnecessary style change here and below? ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java: ########## @@ -237,21 +246,22 @@ public DataPage visit(DataPageV2 dataPageV2) { } } else { if (null != blockDecryptor) { - pageBytes = BytesInput.from( - blockDecryptor.decrypt(pageBytes.toByteArray(), dataPageAAD)); + pageBytes = BytesInput.from( Review Comment: ditto ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetMetricsCallback.java: ########## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.hadoop; + +/** + * a simple interface to pass bask metric values by name to any implementation. Typically an Review Comment: ```suggestion * a simple interface to pass basic metric values by name to any implementation. Typically an ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReaderMetrics.java: ########## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.hadoop; + +public enum ParquetFileReaderMetrics { Review Comment: The alignment looks inconsistent. Could you fix that? ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ########## @@ -1870,6 +1886,23 @@ public void readAll(SeekableInputStream f, ChunkListBuilder builder) throws IOEx } } + private void setReadMetrics(long startNs) { + if (metricsCallback != null) { + long totalFileReadTimeNs = System.nanoTime() - startNs; + double sizeInMb = ((double) length) / (1024 * 1024); + double timeInSec = ((double) totalFileReadTimeNs) / 1000_0000_0000L; + double throughput = sizeInMb / timeInSec; + LOG.debug( + "Parquet: File Read stats: Length: {} MB, Time: {} secs, throughput: {} MB/sec ", + sizeInMb, + timeInSec, + throughput); Review Comment: ```suggestion LOG.debug( "Parquet: File Read stats: Length: {} MB, Time: {} secs, throughput: {} MB/sec ", sizeInMb, timeInSec, throughput); ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java: ########## @@ -237,21 +246,22 @@ public DataPage visit(DataPageV2 dataPageV2) { } } else { if (null != blockDecryptor) { - pageBytes = BytesInput.from( - blockDecryptor.decrypt(pageBytes.toByteArray(), dataPageAAD)); + pageBytes = BytesInput.from( + blockDecryptor.decrypt(pageBytes.toByteArray(), dataPageAAD)); } if (dataPageV2.isCompressed()) { int uncompressedSize = Math.toIntExact( dataPageV2.getUncompressedSize() - dataPageV2.getDefinitionLevels().size() - dataPageV2.getRepetitionLevels().size()); + long start = System.nanoTime(); pageBytes = decompressor.decompress(pageBytes, uncompressedSize); + setDecompressMetrics(pageBytes, start); Review Comment: Do we need to include decompression of dictionary page? > Add metrics support for parquet file reader > ------------------------------------------- > > Key: PARQUET-2374 > URL: https://issues.apache.org/jira/browse/PARQUET-2374 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.13.1 > Reporter: Parth Chandra > Priority: Major > > ParquetFileReader is used by many engines - Hadoop, Spark among them. These > engines report various metrics to measure performance in different > environments and it is usually useful to be able to get low level metrics out > of the file reader and writers. > It would be very useful to allow a simple interface to report the metrics. > Callers can then implement the interface to record the metrics in any > subsystem they choose. -- This message was sent by Atlassian Jira (v8.20.10#820010)