[
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702603#comment-14702603
]
Cheng Hao commented on SPARK-9357:
----------------------------------
JoinedRow is probably in high efficiency for case like:
{code}
CREATE TABLE a AS SELECT * FROM t1 JOIN t2 on t1.key=t2.key and t1.col1 <
t2.col1;
{code}
If the table t1 and t2 are large tables with lots of columns, and most of
records will be filtered out in t1.col1 < t2.col2.
Maybe we can create an multi-nary JoinedRow instead of the binary JoinedRow,
any thoughts?
> Remove JoinedRow
> ----------------
>
> Key: SPARK-9357
> URL: https://issues.apache.org/jira/browse/SPARK-9357
> Project: Spark
> Issue Type: Umbrella
> Components: SQL
> Reporter: Reynold Xin
>
> JoinedRow was introduced to join two rows together, in aggregation (join key
> and value), joins (left, right), window functions, etc.
> It aims to reduce the amount of data copied, but incurs branches when the row
> is actually read. Given all the fields will be read almost all the time
> (otherwise they get pruned out by the optimizer), branch predictor cannot do
> anything about those branches.
> I think a better way is just to remove this thing, and materializes the row
> data directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]