[ 
https://issues.apache.org/jira/browse/DRILL-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017193#comment-16017193
 ] 

Arina Ielchiieva commented on DRILL-5524:
-----------------------------------------

There is ProjectRemoveRule rule in Calcite that can be added to Drill rules set 
so project stage will be removed if is not needed.
But there is a problem with implicit columns. For example, we have star query 
with implicit column:  select *, fqn from t.
On scan stage Drill passes list of columns to retrieve. But when there is star 
in query, Drill assumes that other columns indicated in query will be retrieved 
anyway, so it simplifies list of columns to "columns=[`*`]".

At this point we don't know if we may need implicit column or not, so we add 
them anyway.
https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/ImplicitColumnExplorer.java#L143

And if they are not needed, we filter out them during project stage.
https://github.com/apache/drill/blob/0dc237e3161cf284212cc63f740b229d4fee8fdf/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L357
https://github.com/apache/drill/blob/0dc237e3161cf284212cc63f740b229d4fee8fdf/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L379

But when ProjectRemoveRule rule removes project stage, implicit columns are 
shown. This rule is used in Jdbc plugin and there is corresponding bug 
(DRILL-4903). 

So before applying this rule, we need to make sure that problem with implicit 
columns is resolved.
For example, we may forbid using implicit columns with star queries or include 
implicit column in column list even if star is present -> columns=[`*, fqn`].

> Remove no-op projects from query plan
> -------------------------------------
>
>                 Key: DRILL-5524
>                 URL: https://issues.apache.org/jira/browse/DRILL-5524
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider a very simple query using the mock data source:
> {code}
> SELECT id_i, name_s10 FROM `mock`.`employees_10K`
> {code}
> This just says to create two columns: one int, one varchar of length 10, and 
> fill them with random data to create 10,000 records.
> The query simply passes the columns directly from the input to the client.
> Yet, the query plan includes a "no-op" project:
> {code}
>   "graph" : [ {
>     "pop" : "mock-scan",
>     "@id" : 2, ...
>   }, {
>     "pop" : "project",
>     "@id" : 1,
>     "exprs" : [ {
>       "ref" : "`id_i`",
>       "expr" : "`id_i`"
>     }, {
>       "ref" : "`name_s10`",
>       "expr" : "`name_s10`"
>     } ], ...
>   }, {
>     "pop" : "screen",
>     "@id" : 0, ...
>   } ]
> }
> {code}
> When executed, the project operator generates code that does nothing:
> {code}
> public class ProjectorGen0 extends ProjectorTemplate {
>     public void doEval(int inIndex, int outIndex)
>         throws SchemaChangeException
>     { }
>     public void doSetup(FragmentContext context, RecordBatch incoming, 
> RecordBatch outgoing)
>         throws SchemaChangeException
>     { }
> }
> {code}
> Yet, the project code still insists on stepping through each row, despite the 
> fact that the code does nothing per record:
> {code}
>       for (i = startIndex; i < startIndex + recordCount; i++, 
> firstOutputIndex++) {
>         try {
>           doEval(i, firstOutputIndex);
>         } ...
>       }
> {code}
> The request is to both:
> 1. Skip the per-record loop if all transfers are at the vector level, and
> 2. Omit the entire project step if nothing changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to