[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830247#comment-13830247
 ] 

Remus Rusanu commented on HIVE-5817:
------------------------------------

I think the only real problem operator is JOIN. Is not necessarily ‘one VC per 
operator’ but more like ‘one VC per query region’ where query region is defined 
by boundaries between different VS requirements (basically different result 
shapes). An operator like JOIN is one that clearly introduces a boundary, and 
the interesting part is that it needs two vectorization contexts: one for it’s 
input(s) and one for it’s output. So it would be more along the line that 
during vectorization each operator takes an VC (for its input, provided by its 
parent operator) and gives out a VC for its output, for its child operators to 
consume. Most operators would give out the same VC they get as input (ie. they 
do not change shape). And there is serialization too, which is handled 
separately (as properties added to the Map).

I'll try to come up with actual code over this week end.

> column name to index mapping in VectorizationContext is broken
> --------------------------------------------------------------
>
>                 Key: HIVE-5817
>                 URL: https://issues.apache.org/jira/browse/HIVE-5817
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Sergey Shelukhin
>            Assignee: Remus Rusanu
>            Priority: Critical
>         Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to