guiyanakuang opened a new issue, #1470: URL: https://github.com/apache/orc/issues/1470
The comments in ColumnVector indicate that we should only use the `isNull` array to determine null values when `noNulls` is set to false. This helps us reuse `ColumnVector`, for example, during reuse when `noNulls` transitions from `true` to `true` or `false` to `true`, we don't need to set `isNull`, which can help improve performance. Unfortunately, there are some counterexamples in the Java impl, such as in [BitFieldReader.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/BitFieldReader.java#L90-L107), where we directly read `isNull` without checking `noNulls` first. To ensure correctness, Java resets `noNulls` and `isNull` every time, as seen in [TreeReaderFactory.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L358-L391). This issue originates from the discussion of ORC-1408, whether the ORC Java version needs to modify all counterexamples to adhere to the contract. I'm inclined to make the changes to be consistent with the C++ impl and to avoid unnecessary `isNull` setting, although I haven't verified how much performance improvement it could bring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
