[
https://issues.apache.org/jira/browse/HIVE-26447?focusedWorklogId=797491&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797491
]
ASF GitHub Bot logged work on HIVE-26447:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Aug/22 03:48
Start Date: 03/Aug/22 03:48
Worklog Time Spent: 10m
Work Description: yigress commented on code in PR #3492:
URL: https://github.com/apache/hive/pull/3492#discussion_r936209369
##########
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMapIndexStringScalar.java:
##########
@@ -66,10 +66,13 @@ public int findScalarInMap(MapColumnVector mapColumnVector,
int mapBatchIndex) {
byte[][] keyVector = keyColVector.vector;
int[] keyStart = keyColVector.start;
int[] keyLength = keyColVector.length;
+ final boolean isRepeating = keyColVector.isRepeating;
for (int i = 0; i < count; i++) {
final int keyOffset = offset + i;
+ final int len = isRepeating? keyLength[i]: keyLength[keyOffset];
Review Comment:
I found that the key is set isRepeating only when there is a single key
(count=1), but I am not sure if different orc versions may have multiple keys,
if there are multiple keys in repeating, keyLength may have multiple entries.
Would it be better just deal with the isRepeating and count=1 separately?
Issue Time Tracking
-------------------
Worklog Id: (was: 797491)
Time Spent: 0.5h (was: 20m)
> Vectorization: wrong results when filter on repeating map key orc table
> -----------------------------------------------------------------------
>
> Key: HIVE-26447
> URL: https://issues.apache.org/jira/browse/HIVE-26447
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 3.1.3, 4.0.0
> Reporter: Yi Zhang
> Assignee: Yi Zhang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Example reproducible case:
>
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map<string,int>) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC',
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.
> This seems to be a corner case of ORC table have repeating string type key
> for map field in the MapColumnVector.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)