[jira] [Commented] (DRILL-1647) Flatten operator cannot be used twice in a select clause

Jinfeng Ni (JIRA) Fri, 07 Nov 2014 17:27:07 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203097#comment-14203097
 ]


Jinfeng Ni commented on DRILL-1647:
-----------------------------------

I tried a slightly modified query and checked the plan after the rule is 
enabled. 

test("select flatten(z), flatten(l) from cp.`/jsoninput/input2_modified.json`");

Seems that the Project below Flatten operators are producing some unnecessary 
columns. 

Drill Physical : 
00-00    Screen: rowcount = 2.0, cumulative cost = {12.2 rows, 44.2 cpu, 0.0 
io, 0.0 network, 0.0 memory}, id = 448
00-01      Project(EXPR$0=[$2], EXPR$1=[$3]): rowcount = 2.0, cumulative cost = 
{12.0 rows, 44.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 447
00-02        DrillFlatten: rowcount = 2.0, cumulative cost = {10.0 rows, 36.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
00-03          Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$1]): 
rowcount = 2.0, cumulative cost = {8.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 445
00-04            DrillFlatten: rowcount = 2.0, cumulative cost = {6.0 rows, 
18.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 444
00-05              Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0]): rowcount = 
2.0, cumulative cost = {4.0 rows, 16.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 443
00-06                Scan(groupscan=[EasyGroupScan 
[selectionRoot=/jsoninput/input2_modified.json, numFiles=1, columns = 
[SchemaPath [`z`], SchemaPath [`l`]]]]): rowcount = 2.0, cumulative cost = {2.0 
rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 442

POP 05 and POP 05 both produces more than necessary columns.  When a column is 
referenced multiple times, ProjectRecordBatch will do transfer for the 1st 
reference, and do copy for the rest reference. This means if the column is 
complex type with reasonably big amount of data, then those additional 
references would cause performance impact, by doing copy, and those additional 
copy would end up with being pruned out by the top project operator.  

Can you see if it's possible to remove those unnecessary projected columns just 
under each Flatten?

One minor comment: Can you override explainTerms() for DrillFlattenPrel, so 
that it would show its input expression in the EXPLAIN result? It will make it 
easier to understand the execution plan. 




> Flatten operator cannot be used twice in a select clause
> --------------------------------------------------------
>
>                 Key: DRILL-1647
>                 URL: https://issues.apache.org/jira/browse/DRILL-1647
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jason Altekruse
>            Assignee: Jinfeng Ni
>         Attachments: 
> Drill-1647-multiple-flattens-complex-expression-rewriting.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1647) Flatten operator cannot be used twice in a select clause

Reply via email to