[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652390#comment-14652390
]
Herman van Hovell commented on SPARK-9357:
------------------------------------------
+1 for removing this.
The {{AlgebraicAggregate}} part of the new UDAF interfaces uses this a lot in
very performance critical sections. I ran into this doing some benchmarking for
the SPARK-8641 ticket. After some profiling I found out that a significant
amount of time is spent in JoinedRow; It causes a major performance regression
in some cases.
I have been doing some experimentation:
* After a discussion with [~yhuai] I tried removing the branching in the joined
row. This improved the situation by 10%.
* I created specialized {{BoundReference}}'s which bind directly to the
{{row1}} or {{row2}} values of the JoinedRow (exposed those through getters).
This worked wonders. Performance is much better in most cases now. In the end I
think it is best to explicitly start to support a {{JoinProjection}} which
takes a left and a right row as input and produces an output row, and change
{{SparkPlan}} and CG accordingly. I think we'd still need JoinedRow in the
interpreted case though.
I can turn the POC I have made into a PR for discussion.
> Remove JoinedRow
> ----------------
>
> Key: SPARK-9357
> URL: https://issues.apache.org/jira/browse/SPARK-9357
> Project: Spark
> Issue Type: Umbrella
> Components: SQL
> Reporter: Reynold Xin
>
> JoinedRow was introduced to join two rows together, in aggregation (join key
> and value), joins (left, right), window functions, etc.
> It aims to reduce the amount of data copied, but incurs branches when the row
> is actually read. Given all the fields will be read almost all the time
> (otherwise they get pruned out by the optimizer), branch predictor cannot do
> anything about those branches.
> I think a better way is just to remove this thing, and materializes the row
> data directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]