cnauroth commented on PR #3859: URL: https://github.com/apache/hive/pull/3859#issuecomment-1366907555
Hello @amanraj2520 . Referring to the options above: 1. I think the ideal path is to get the test fixed. See below for more analysis from me and a proposed path forward. 2. I'd prefer not to revert back to the earlier version, because we set a goal on the 3.2 release to upgrade dependencies and clear out their CVEs as much as possible. That said, I reviewed CVEs on the older release, and I don't think they have much practical impact on Hive, so I'm not opposed to this as a fallback option if we get stuck. 3. I'd be concerned about a significant major version bump all the way to Arrow 2.0.0. I don't know Arrow well enough to comment on backward-compatibility of that upgrade. The test is failing specifically on [serializing a row of nulls in all columns](https://github.com/apache/hive/blob/branch-3/ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestArrowColumnarBatchSerDe.java#L374-L378). I confirmed that it's this row specifically by commenting out the row and seeing the test pass. I also confirmed that it's specifically failing while serializing a struct column. (Null values for primitive types are fine.) The problem appears to be that the serializer does not correctly track null values within a null struct. We should be calling the Arrow `vector.setNull`, but instead, it ends up calling `vector.setSafe`. There is another patch, [HIVE-25243](https://issues.apache.org/jira/browse/HIVE-25243) / PR #2391 that fixed this on master. I tried applying both your patch and a slightly different version of HIVE-25243, and then the test passed locally. The only thing I don't understand is why this was ever passing with the old version. I guess there are some versions of Arrow + Netty that are just more tolerant of clients calling `vector.setSafe` with null values. I propose that first, we merge in the current pull request, even with a known test failure. There are already a lot of changes in this patch. Then, I can queue up a separate backport of HIVE-25243. This is non-binding though, so let's see if we can get confirmation on the plan from a committer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
