Aman Sinha created DRILL-3973:
---------------------------------

             Summary: Add profiling for the time spent in metadata operations 
and planning
                 Key: DRILL-3973
                 URL: https://issues.apache.org/jira/browse/DRILL-3973
             Project: Apache Drill
          Issue Type: Improvement
          Components: Metadata, Query Planning & Optimization
    Affects Versions: 1.2.0
            Reporter: Aman Sinha
            Assignee: Mehant Baid


In order to determine where time is spent during metadata operations and query 
planning (which includes partition pruning) we need to add more profiling: 
  - time to read the parquet metadata from the parquet files is already logged 
but the same needs to be done when the metadata is read from the cache file.
  - the analysis of whether a column is a candidate partition column by 
comparing the min/max values should be profiled. 
  - ParquetGroupScan.init() needs some finer granularity timings
  - The places where getFileStatusList() is called needs to be profiled since 
this is an expensive operation for large number of files (hundreds of 
thousands). 
  - PruneScanRule:  currently the profile timings are for each batch of files.  
Need to do finer grained where interpreter evaluation of the filter, analysis 
of the filter condition etc. are collected. 
  - Add instrumentation around the places where affinity analysis is done. 

Such profiling is needed to understand excessively long planning times when 
large number of files are present. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to