[ 
https://issues.apache.org/jira/browse/HIVE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748522#action_12748522
 ] 

Ning Zhang commented on HIVE-790:
---------------------------------

Measured the performance of UnionOperator.processOp() sync vs. no-sync. 
Surprisingly the sync one performs a little bit better. Here's the query:

insert overwrite table tmp_nzhang_ad_union select * from (select * from 
nzhang_ad_imps_2_lazysimple union all select * from 
nzhang_ad_imps_2_lazysimple) s;

The table nzhang_ad_imps_2_lazysimple has 180k rows and about 100MB. I run the 
query twice for each test and looked at the mapper's log for the wallclock time 
(end_time-begin_time).

Sync:
mappers of 1st MapRed job:    avg over all mappers of two runs: 3.75025 sec
mappers of 2nd MapRed Job: avg over all mappers of two runs: 5.152 sec.

No-sync:
mappers of 1st MapRed job:    avg over all mappers of two runs: 4.1065 sec
mappers of 2nd MapRed Job: avg over all mappers of two runs: 5.252 sec.


> race condition related to ScriptOperator + UnionOperator
> --------------------------------------------------------
>
>                 Key: HIVE-790
>                 URL: https://issues.apache.org/jira/browse/HIVE-790
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Ning Zhang
>         Attachments: Hive-790.patch, Hive-790_2.patch, Hive-790_3.patch
>
>
> ScriptOperator uses a second thread to output the rows to the children 
> operators. In a corner case which contains a union, 2 threads might be 
> outputting data into the same operator hierarchy and caused race conditions.
> {code}
> CREATE TABLE tablea (cola STRING);
> SELECT *
> FROM (
>     SELECT TRANSFORM(cola)
>     USING 'cat'
>     AS cola
>     FROM tablea
>   UNION ALL
>     SELECT cola as cola
>     FROM tablea
> ) a;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to