[
https://issues.apache.org/jira/browse/DRILL-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978120#comment-16978120
]
Paul Rogers commented on DRILL-7451:
------------------------------------
It appears that the actual behavior is a bit more complex. Run the same test as
above, with the same query, but now mark the plugin as projection pushdown is
*not* supported. In this case we get two projects. This suggests that the
project above is added for a different reason, but it is still trivial and
should be removed.
Logical plan with scan project pushdown disabled:
{code:json}
"graph" : [ {
"pop" : "DummyGroupScan",
"@id" : 3,
"columns" : [ "`**`" ],
"userName" : "progers",
"cost" : {
"memoryCost" : 1.6777216E7,
"outputRowCount" : 10000.0
}
}, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
"ref" : "`a`",
"expr" : "`a`"
}, {
"ref" : "`b`",
"expr" : "`b`"
}, {
"ref" : "`c`",
"expr" : "`c`"
} ],
"child" : 3,
"outputProj" : true,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : {
"memoryCost" : 1.6777216E7,
"outputRowCount" : 10000.0
}
}, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
"ref" : "`a`",
"expr" : "`a`"
}, {
"ref" : "`b`",
"expr" : "`b`"
}, {
"ref" : "`c`",
"expr" : "`c`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : {
"memoryCost" : 1.6777216E7,
"outputRowCount" : 10000.0
}
}, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : {
"memoryCost" : 1.6777216E7,
"outputRowCount" : 10000.0
}
} ]
{code}
> Planner inserts project node even if scan handles project push-down
> -------------------------------------------------------------------
>
> Key: DRILL-7451
> URL: https://issues.apache.org/jira/browse/DRILL-7451
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Paul Rogers
> Priority: Minor
>
> I created a "dummy" storage plugin for testing. The test does a simple query:
> {code:sql}
> SELECT a, b, c from dummy.myTable
> {code}
> The first test is to mark the plugin's group scan as supporting projection
> push down. However, Drill still creates a projection node in the logical plan:
> {code:json}
> "graph" : [ {
> "pop" : "DummyGroupScan",
> "@id" : 2,
> "columns" : [ "`**`" ],
> "userName" : "progers",
> "cost" : {
> "memoryCost" : 1.6777216E7,
> "outputRowCount" : 10000.0
> }
> }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
> "ref" : "`a`",
> "expr" : "`a`"
> }, {
> "ref" : "`b`",
> "expr" : "`b`"
> }, {
> "ref" : "`c`",
> "expr" : "`c`"
> } ],
> "child" : 2,
> "outputProj" : true,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : {
> "memoryCost" : 1.6777216E7,
> "outputRowCount" : 10000.0
> }
> }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : {
> "memoryCost" : 1.6777216E7,
> "outputRowCount" : 10000.0
> }
> } ]
> {code}
> There is [a comment in the
> code|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java#L109]
> that suggests the project should be removed:
> {code:java}
> // project above scan may be removed in ProjectRemoveRule for
> // the case when it is trivial
> {code}
> As shown in the example, the project is trivial. There is a subtlety: it may
> be that the scan, unknown to the planner, produce additional columns, say
> {{d}} and {{e}} which the project operator is needed to remove.
> If this is the reason the project remains, perhaps we can add a flag of some
> kind where the group scan can insist that not only does it handle projection,
> it will not insert additional columns. At that point, the project is
> completely unnecessary in this case.
> This is not a functional bug; just a performance issue: we exercise the
> machinery of the project operator to do exactly nothing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)