[
https://issues.apache.org/jira/browse/HIVE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748522#action_12748522
]
Ning Zhang commented on HIVE-790:
---------------------------------
Measured the performance of UnionOperator.processOp() sync vs. no-sync.
Surprisingly the sync one performs a little bit better. Here's the query:
insert overwrite table tmp_nzhang_ad_union select * from (select * from
nzhang_ad_imps_2_lazysimple union all select * from
nzhang_ad_imps_2_lazysimple) s;
The table nzhang_ad_imps_2_lazysimple has 180k rows and about 100MB. I run the
query twice for each test and looked at the mapper's log for the wallclock time
(end_time-begin_time).
Sync:
mappers of 1st MapRed job: avg over all mappers of two runs: 3.75025 sec
mappers of 2nd MapRed Job: avg over all mappers of two runs: 5.152 sec.
No-sync:
mappers of 1st MapRed job: avg over all mappers of two runs: 4.1065 sec
mappers of 2nd MapRed Job: avg over all mappers of two runs: 5.252 sec.
> race condition related to ScriptOperator + UnionOperator
> --------------------------------------------------------
>
> Key: HIVE-790
> URL: https://issues.apache.org/jira/browse/HIVE-790
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Zheng Shao
> Assignee: Ning Zhang
> Attachments: Hive-790.patch, Hive-790_2.patch, Hive-790_3.patch
>
>
> ScriptOperator uses a second thread to output the rows to the children
> operators. In a corner case which contains a union, 2 threads might be
> outputting data into the same operator hierarchy and caused race conditions.
> {code}
> CREATE TABLE tablea (cola STRING);
> SELECT *
> FROM (
> SELECT TRANSFORM(cola)
> USING 'cat'
> AS cola
> FROM tablea
> UNION ALL
> SELECT cola as cola
> FROM tablea
> ) a;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.