[
https://issues.apache.org/jira/browse/HIVE-26840?focusedWorklogId=836011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836011
]
ASF GitHub Bot logged work on HIVE-26840:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Dec/22 20:46
Start Date: 28/Dec/22 20:46
Worklog Time Spent: 10m
Work Description: cnauroth commented on PR #3859:
URL: https://github.com/apache/hive/pull/3859#issuecomment-1366907555
Hello @amanraj2520 . Referring to the options above:
1. I think the ideal path is to get the test fixed. See below for more
analysis from me and a proposed path forward.
2. I'd prefer not to revert back to the earlier version, because we set a
goal on the 3.2 release to upgrade dependencies and clear out their CVEs as
much as possible. That said, I reviewed CVEs on the older release, and I don't
think they have much practical impact on Hive, so I'm not opposed to this as a
fallback option if we get stuck.
3. I'd be concerned about a significant major version bump all the way to
Arrow 2.0.0. I don't know Arrow well enough to comment on
backward-compatibility of that upgrade.
The test is failing specifically on [serializing a row of nulls in all
columns](https://github.com/apache/hive/blob/branch-3/ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestArrowColumnarBatchSerDe.java#L374-L378).
I confirmed that it's this row specifically by commenting out the row and
seeing the test pass. I also confirmed that it's specifically failing while
serializing a struct column. (Null values for primitive types are fine.)
The problem appears to be that the serializer does not correctly track null
values within a null struct. We should be calling the Arrow `vector.setNull`,
but instead, it ends up calling `vector.setSafe`. There is another patch,
[HIVE-25243](https://issues.apache.org/jira/browse/HIVE-25243) / PR #2391 that
fixed this on master. I tried applying both your patch and a slightly different
version of HIVE-25243, and then the test passed locally. The only thing I don't
understand is why this was ever passing with the old version. I guess there are
some versions of Arrow + Netty that are just more tolerant of clients calling
`vector.setSafe` with null values.
I propose that first, we merge in the current pull request, even with a
known test failure. There are already a lot of changes in this patch. Then, I
can queue up a separate backport of HIVE-25243. This is non-binding though, so
let's see if we can get confirmation on the plan from a committer.
Issue Time Tracking
-------------------
Worklog Id: (was: 836011)
Time Spent: 3h 40m (was: 3.5h)
> Backport of HIVE-23073 and HIVE-24138
> -------------------------------------
>
> Key: HIVE-26840
> URL: https://issues.apache.org/jira/browse/HIVE-26840
> Project: Hive
> Issue Type: Sub-task
> Reporter: Aman Raj
> Assignee: Aman Raj
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)