guiyanakuang opened a new issue, #1470:
URL: https://github.com/apache/orc/issues/1470

   The comments in ColumnVector indicate that we should only use the `isNull` 
array to determine null values when `noNulls` is set to false. This helps us 
reuse `ColumnVector`, for example, during reuse when `noNulls` transitions from 
`true` to `true` or `false` to `true`, we don't need to set `isNull`, which can 
help improve performance.
   
   Unfortunately, there are some counterexamples in the Java impl, such as in 
[BitFieldReader.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/BitFieldReader.java#L90-L107),
 where we directly read `isNull` without checking `noNulls` first. To ensure 
correctness, Java resets `noNulls` and `isNull` every time, as seen in 
[TreeReaderFactory.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L358-L391).
   
   This issue originates from the discussion of ORC-1408, whether the ORC Java 
version needs to modify all counterexamples to adhere to the contract. I'm 
inclined to make the changes to be consistent with the C++ impl and to avoid 
unnecessary `isNull` setting, although I haven't verified how much performance 
improvement it could bring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to