[
https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056901#comment-15056901
]
ASF GitHub Bot commented on DRILL-4152:
---------------------------------------
Github user adeneche commented on a diff in the pull request:
https://github.com/apache/drill/pull/298#discussion_r47572637
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
---
@@ -196,7 +220,15 @@ public boolean next() throws IOException {
// TODO - figure out if we need multiple dictionary pages, I believe
it may be limited to one
// I think we are clobbering parts of the dictionary if there can be
multiple pages of dictionary
do {
+ long start=inputStream.getPos();
+ timer.start();
pageHeader = dataReader.readPageHeader();
+ long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
+ this.updateStats(pageHeader, "Page Header Read", start, timeToRead,
0,0);
+ logger.trace("ParquetTrace,{},{},{},{},{},{},{},{}","Page Header
Read","",
+ this.parentColumnReader.parentReader.hadoopPath,
+ this.parentColumnReader.columnDescriptor.toString(), start, 0,
0, timeToRead);
+ timer.reset();
--- End diff --
same here
> Add additional logging and metrics to the Parquet reader
> --------------------------------------------------------
>
> Key: DRILL-4152
> URL: https://issues.apache.org/jira/browse/DRILL-4152
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Parth Chandra
> Assignee: Deneche A. Hakim
>
> In some cases, we see the Parquet reader as the bottleneck in reading from
> the file system. RWSpeedTest is able to read 10x faster than the Parquet
> reader so reading from disk is not the issue. This issue is to add more
> instrumentation to the Parquet reader so speed bottlenecks can be better
> diagnosed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)