[
https://issues.apache.org/jira/browse/DRILL-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203097#comment-14203097
]
Jinfeng Ni commented on DRILL-1647:
-----------------------------------
I tried a slightly modified query and checked the plan after the rule is
enabled.
test("select flatten(z), flatten(l) from cp.`/jsoninput/input2_modified.json`");
Seems that the Project below Flatten operators are producing some unnecessary
columns.
Drill Physical :
00-00 Screen: rowcount = 2.0, cumulative cost = {12.2 rows, 44.2 cpu, 0.0
io, 0.0 network, 0.0 memory}, id = 448
00-01 Project(EXPR$0=[$2], EXPR$1=[$3]): rowcount = 2.0, cumulative cost =
{12.0 rows, 44.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 447
00-02 DrillFlatten: rowcount = 2.0, cumulative cost = {10.0 rows, 36.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
00-03 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$1]):
rowcount = 2.0, cumulative cost = {8.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 445
00-04 DrillFlatten: rowcount = 2.0, cumulative cost = {6.0 rows,
18.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 444
00-05 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0]): rowcount =
2.0, cumulative cost = {4.0 rows, 16.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 443
00-06 Scan(groupscan=[EasyGroupScan
[selectionRoot=/jsoninput/input2_modified.json, numFiles=1, columns =
[SchemaPath [`z`], SchemaPath [`l`]]]]): rowcount = 2.0, cumulative cost = {2.0
rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 442
POP 05 and POP 05 both produces more than necessary columns. When a column is
referenced multiple times, ProjectRecordBatch will do transfer for the 1st
reference, and do copy for the rest reference. This means if the column is
complex type with reasonably big amount of data, then those additional
references would cause performance impact, by doing copy, and those additional
copy would end up with being pruned out by the top project operator.
Can you see if it's possible to remove those unnecessary projected columns just
under each Flatten?
One minor comment: Can you override explainTerms() for DrillFlattenPrel, so
that it would show its input expression in the EXPLAIN result? It will make it
easier to understand the execution plan.
> Flatten operator cannot be used twice in a select clause
> --------------------------------------------------------
>
> Key: DRILL-1647
> URL: https://issues.apache.org/jira/browse/DRILL-1647
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jason Altekruse
> Assignee: Jinfeng Ni
> Attachments:
> Drill-1647-multiple-flattens-complex-expression-rewriting.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)