[ 
https://issues.apache.org/jira/browse/HIVE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748262#action_12748262
 ] 

Ning Zhang commented on HIVE-790:
---------------------------------

@zheng, I'll fix the comment and the test query.

As for the new state, maybe "FINISH" is not a good name for it but I think we 
need two states since they have two different situations when an operator has 
two or more parents: 
 1) the close() is called on this operator, but it doesn't guarantee all its 
child operators are also called close() (the FINISH state)
 2) the close() is called and all its children are called close() (the CLOSE 
state).

The current code set the state CLOSE at the end of the function, which means 
all its children (eventually desendants) are closed. So it is the second 
semantics. What you proposed is the first semantics, to implement which we need 
to move the statement to set the state to CLOSE to the beginning of the close() 
function (just after the check of the CLOSE state and return if true). 

We need both both states since if we just have 1 state (CLOSE) and assign it in 
the beginning, if there are two parents to the operator, when the first parent 
call close(), this operator will set it state to CLOSE and just return without 
calling close() to all its children (since the other parent has not been 
closed). When the second parent call close(), it just return since its state is 
already closed. So this end up all children are not closed. We should not 
remove the CLOSE state checkup in the beginning since that may cause an 
operator being closed multiple times.

We cannot use just the CLOSE state as it is in the current implementation as 
well since the CLOSE state is set at the end of the close() function. When a 
parent calls this operator's close(), the parent's state is still not in CLOSE. 
So we end up just return and don't close the child operators. If we have the 
FINISH state and this state is set at the beginning of close(), whenever a 
parent calls close(), the parent is in the FINISH state and this operator can 
check and treat FINISH the same as CLOSE except that this operator hasn't 
return yet. 


> race condition related to ScriptOperator + UnionOperator
> --------------------------------------------------------
>
>                 Key: HIVE-790
>                 URL: https://issues.apache.org/jira/browse/HIVE-790
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Ning Zhang
>         Attachments: Hive-790.patch
>
>
> ScriptOperator uses a second thread to output the rows to the children 
> operators. In a corner case which contains a union, 2 threads might be 
> outputting data into the same operator hierarchy and caused race conditions.
> {code}
> CREATE TABLE tablea (cola STRING);
> SELECT *
> FROM (
>     SELECT TRANSFORM(cola)
>     USING 'cat'
>     AS cola
>     FROM tablea
>   UNION ALL
>     SELECT cola as cola
>     FROM tablea
> ) a;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to