cnauroth commented on PR #3859:
URL: https://github.com/apache/hive/pull/3859#issuecomment-1366907555

   Hello @amanraj2520 . Referring to the options above:
   
   1. I think the ideal path is to get the test fixed. See below for more 
analysis from me and a proposed path forward.
   2. I'd prefer not to revert back to the earlier version, because we set a 
goal on the 3.2 release to upgrade dependencies and clear out their CVEs as 
much as possible. That said, I reviewed CVEs on the older release, and I don't 
think they have much practical impact on Hive, so I'm not opposed to this as a 
fallback option if we get stuck.
   3. I'd be concerned about a significant major version bump all the way to 
Arrow 2.0.0. I don't know Arrow well enough to comment on 
backward-compatibility of that upgrade.
   
   The test is failing specifically on [serializing a row of nulls in all 
columns](https://github.com/apache/hive/blob/branch-3/ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestArrowColumnarBatchSerDe.java#L374-L378).
 I confirmed that it's this row specifically by commenting out the row and 
seeing the test pass. I also confirmed that it's specifically failing while 
serializing a struct column. (Null values for primitive types are fine.)
   
   The problem appears to be that the serializer does not correctly track null 
values within a null struct. We should be calling the Arrow `vector.setNull`, 
but instead, it ends up calling `vector.setSafe`. There is another patch, 
[HIVE-25243](https://issues.apache.org/jira/browse/HIVE-25243) / PR #2391 that 
fixed this on master. I tried applying both your patch and a slightly different 
version of HIVE-25243, and then the test passed locally. The only thing I don't 
understand is why this was ever passing with the old version. I guess there are 
some versions of Arrow + Netty that are just more tolerant of clients calling 
`vector.setSafe` with null values.
   
   I propose that first, we merge in the current pull request, even with a 
known test failure. There are already a lot of changes in this patch. Then, I 
can queue up a separate backport of HIVE-25243. This is non-binding though, so 
let's see if we can get confirmation on the plan from a committer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to