[jira] [Commented] (DRILL-4279) The plan is either confusing or could lead to execution problem, when no columns is required from SCAN

ASF GitHub Bot (JIRA) Sun, 17 Jan 2016 21:33:07 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104198#comment-15104198
 ]


ASF GitHub Bot commented on DRILL-4279:
---------------------------------------

GitHub user jinfengni opened a pull request:

    https://github.com/apache/drill/pull/328

    Drill 2517: Apply directory-based partition pruning before reading files in 
planning.

    1. Run the pre-commit tests and unit tests. Some queries in pre-commit 
suites have changed plan. Most of the changed plan looks better than before. 
The only exception is for the cases caused by one existing issue (DRILL-4279), 
when * column is used together with SKIP_ALL mode. That happens when the filter 
is applied and then removed, for the following query:
    
    SELECT count(*) from T1 where dir0 = 1990 and dir1 = 'Q1'.
    
    2. I need figure out how to cherry-pick Adam's patch in DRILL-2517, since 
that's the initial work on this issue, although there is quite big change from 
that patch.  
    
    @amansinha100 , could you please take a look and give some initial review 
comments? Thanks!
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-2517

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #328
    
----
commit bc5e13972d116f41ed9441b49a6781b2b602c2fd
Author: Mehant Baid <[email protected]>
Date:   2015-11-11T06:26:26Z

    DRILL-2571: (Prototype from Mehant) Move directory based partition pruning 
to logical phase.

commit 1b05e372ee7193308bea420302bdd0e259193e3a
Author: Jinfeng Ni <[email protected]>
Date:   2016-01-08T18:28:53Z

    DRILL-2517: Move directory-based partition pruning to Calcite logical 
planning phase.
    
    1) Make directory-based pruning rule both work in calcite logical and drill 
logical planning phase.
    
    2) Only apply directory-based pruning in logical phase when there is no 
metadata cache.
    
    3) Make FileSelection constructor public, since FileSelection.create() 
would modify selectionRoot.

----


> The plan is either confusing or could lead to execution problem, when no 
> columns is required from SCAN
> ------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4279
>                 URL: https://issues.apache.org/jira/browse/DRILL-4279
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> When query does not specify any specific column to be returned SCAN,  for 
> instance,
> {code}
> Q1:  select count(*) from T1;
> Q2:  select 1 + 100 from T1;
> Q3:  select  1.0 + random() from T1; 
> {code}
> Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode. 
> However, the MODE is not serialized / deserialized. This leads to two 
> problems.
> 1).  The EXPLAIN plan is confusing, since there is no way to different from a 
> "SELECT * " query from this SKIP_ALL mode. 
> For instance, 
> {code}
> explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> 00-03          Project($f0=[0])
> 00-04            Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/Users/jni/work/data/yelp/t1, numFiles=2, columns=[`*`], 
> files= ... 
> {code} 
> 2) If the query is to be executed distributed / parallel,  the missing 
> serialization of mode would means some Fragment is fetching all the columns, 
> while some Fragment is skipping all the columns. That will cause execution 
> error.
> For instance, by changing slice_target to enforce the query to be executed in 
> multiple fragments, it will hit execution error. 
> {code}
> select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> Error parsing JSON - You tried to start when you are using a ValueWriter of 
> type NullableBitWriterImpl.
> {code}
> Directory "t1" just contains two yelp JSON files. 
> Ideally, I think when no columns is required from SCAN, the explain plan 
> should show an empty of column list. The MODE of SKIP_ALL together with star 
> * column seems to be confusing and error prone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4279) The plan is either confusing or could lead to execution problem, when no columns is required from SCAN

Reply via email to