Alexander Behm created IMPALA-6625:

             Summary: Skip dictionary and collection conjunct assignment for 
non-Parquet scans.
                 Key: IMPALA-6625
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0
            Reporter: Alexander Behm

In HdfsScanNode.init() we try to assign dictionary and collection conjuncts 
even for non-Parquet scans. Such predicates only make sense for Parquet scans, 
so there is no point in collecting them for other scans.

The current behavior is undesirable because:
* init() can be substantially slower because assigning dictionary filters may 
involve evaluating exprs in the BE which can be expensive
* the explain plan of non-Parquet scans may have a section "parquet dictionary 
predicates" which is confusing/misleading

Relevant code snippet from HdfsScanNode:
  public void init(Analyzer analyzer) throws ImpalaException {
    conjuncts_ = orderConjunctsByCost(conjuncts_);


    // compute scan range locations with optional sampling
    Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer);
    if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should 
go in here

This message was sent by Atlassian JIRA

Reply via email to