[
https://issues.apache.org/jira/browse/DRILL-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104198#comment-15104198
]
ASF GitHub Bot commented on DRILL-4279:
---------------------------------------
GitHub user jinfengni opened a pull request:
https://github.com/apache/drill/pull/328
Drill 2517: Apply directory-based partition pruning before reading files in
planning.
1. Run the pre-commit tests and unit tests. Some queries in pre-commit
suites have changed plan. Most of the changed plan looks better than before.
The only exception is for the cases caused by one existing issue (DRILL-4279),
when * column is used together with SKIP_ALL mode. That happens when the filter
is applied and then removed, for the following query:
SELECT count(*) from T1 where dir0 = 1990 and dir1 = 'Q1'.
2. I need figure out how to cherry-pick Adam's patch in DRILL-2517, since
that's the initial work on this issue, although there is quite big change from
that patch.
@amansinha100 , could you please take a look and give some initial review
comments? Thanks!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinfengni/incubator-drill DRILL-2517
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/328.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #328
----
commit bc5e13972d116f41ed9441b49a6781b2b602c2fd
Author: Mehant Baid <[email protected]>
Date: 2015-11-11T06:26:26Z
DRILL-2571: (Prototype from Mehant) Move directory based partition pruning
to logical phase.
commit 1b05e372ee7193308bea420302bdd0e259193e3a
Author: Jinfeng Ni <[email protected]>
Date: 2016-01-08T18:28:53Z
DRILL-2517: Move directory-based partition pruning to Calcite logical
planning phase.
1) Make directory-based pruning rule both work in calcite logical and drill
logical planning phase.
2) Only apply directory-based pruning in logical phase when there is no
metadata cache.
3) Make FileSelection constructor public, since FileSelection.create()
would modify selectionRoot.
----
> The plan is either confusing or could lead to execution problem, when no
> columns is required from SCAN
> ------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4279
> URL: https://issues.apache.org/jira/browse/DRILL-4279
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> When query does not specify any specific column to be returned SCAN, for
> instance,
> {code}
> Q1: select count(*) from T1;
> Q2: select 1 + 100 from T1;
> Q3: select 1.0 + random() from T1;
> {code}
> Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode.
> However, the MODE is not serialized / deserialized. This leads to two
> problems.
> 1). The EXPLAIN plan is confusing, since there is no way to different from a
> "SELECT * " query from this SKIP_ALL mode.
> For instance,
> {code}
> explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> 00-03 Project($f0=[0])
> 00-04 Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/Users/jni/work/data/yelp/t1, numFiles=2, columns=[`*`],
> files= ...
> {code}
> 2) If the query is to be executed distributed / parallel, the missing
> serialization of mode would means some Fragment is fetching all the columns,
> while some Fragment is skipping all the columns. That will cause execution
> error.
> For instance, by changing slice_target to enforce the query to be executed in
> multiple fragments, it will hit execution error.
> {code}
> select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
> Error parsing JSON - You tried to start when you are using a ValueWriter of
> type NullableBitWriterImpl.
> {code}
> Directory "t1" just contains two yelp JSON files.
> Ideally, I think when no columns is required from SCAN, the explain plan
> should show an empty of column list. The MODE of SKIP_ALL together with star
> * column seems to be confusing and error prone.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)