[jira] [Commented] (SPARK-9357) Remove JoinedRow

Cheng Hao (JIRA) Tue, 18 Aug 2015 23:53:20 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702603#comment-14702603
 ]


Cheng Hao commented on SPARK-9357:
----------------------------------

JoinedRow is probably in high efficiency for case like:

{code}
CREATE TABLE a AS SELECT * FROM t1 JOIN t2 on t1.key=t2.key and t1.col1 < 
t2.col1;
{code}
If the table t1 and t2 are large tables with lots of columns, and most of 
records will be filtered out in t1.col1 < t2.col2.

Maybe we can create an multi-nary JoinedRow instead of the binary JoinedRow, 
any thoughts?

> Remove JoinedRow
> ----------------
>
>                 Key: SPARK-9357
>                 URL: https://issues.apache.org/jira/browse/SPARK-9357
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>            Reporter: Reynold Xin
>
> JoinedRow was introduced to join two rows together, in aggregation (join key 
> and value), joins (left, right), window functions, etc.
> It aims to reduce the amount of data copied, but incurs branches when the row 
> is actually read. Given all the fields will be read almost all the time 
> (otherwise they get pruned out by the optimizer), branch predictor cannot do 
> anything about those branches.
> I think a better way is just to remove this thing, and materializes the row 
> data directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-9357) Remove JoinedRow

Reply via email to