[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

Matt McCline (JIRA) Tue, 11 Jul 2017 10:24:39 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082593#comment-16082593
 ]


Matt McCline commented on HIVE-17073:
-------------------------------------

[~jcamachorodriguez] Thank you for jumping in with a solution.

The invariant for a VectorizedRowBatch are that the selected array is always 
allocated.

For efficiency, I think we want to pre-allocate a saveSelected array of 
VectorizedRowBatch.DEFAULT_SIZE elements in initializeOp.  When # children > 1, 
then re-allocate that save array *only* if the vrb.size > than current array 
size.  Use System.arraycopy into and out of saveSelected instead of 
Arrays.copyOf since the later method allocates a new object.

> Incorrect result with vectorization and SharedWorkOptimizer
> -----------------------------------------------------------
>
>                 Key: HIVE-17073
>                 URL: https://issues.apache.org/jira/browse/HIVE-17073
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2     0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

Reply via email to