Aman Sinha created DRILL-3973:
---------------------------------
Summary: Add profiling for the time spent in metadata operations
and planning
Key: DRILL-3973
URL: https://issues.apache.org/jira/browse/DRILL-3973
Project: Apache Drill
Issue Type: Improvement
Components: Metadata, Query Planning & Optimization
Affects Versions: 1.2.0
Reporter: Aman Sinha
Assignee: Mehant Baid
In order to determine where time is spent during metadata operations and query
planning (which includes partition pruning) we need to add more profiling:
- time to read the parquet metadata from the parquet files is already logged
but the same needs to be done when the metadata is read from the cache file.
- the analysis of whether a column is a candidate partition column by
comparing the min/max values should be profiled.
- ParquetGroupScan.init() needs some finer granularity timings
- The places where getFileStatusList() is called needs to be profiled since
this is an expensive operation for large number of files (hundreds of
thousands).
- PruneScanRule: currently the profile timings are for each batch of files.
Need to do finer grained where interpreter evaluation of the filter, analysis
of the filter condition etc. are collected.
- Add instrumentation around the places where affinity analysis is done.
Such profiling is needed to understand excessively long planning times when
large number of files are present.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)