[GitHub] [arrow] wgtmac commented on pull request #17877: PARQUET-2225:[C++][Parquet] Allow reading dense with RecordReader

GitBox Wed, 11 Jan 2023 08:01:02 -0800


wgtmac commented on PR #17877:
URL: https://github.com/apache/arrow/pull/17877#issuecomment-1379024270


   > Adding more clarification here.
   > 
   > The change proposed here is about the vector of values that is returned. 
Currently, we first come up with the location of the null values and then make 
a vector that has an empty space for the null values (reading spaced). When 
reading dense, we do not make space for the null values.
   > 
   > For example: def_levels: [0, 1] values: [10]
   > 
   > Reading spaced: [null, 10] Reading dense: [10]
   > 
   > The change here is meaningful for nullable columns. The savings come when 
we have null values. The issue is that 1) it is inefficient to come up with the 
exact space of the nulls and move the values around to make space for null 
values and 2) Some readers may want to indeed read dense and so they have to 
move the null values out again.
   
   Thanks for the explanation. I got your point. 
   
   In that case, I assume the caller does not care about the null values and 
will not be able to sync values of different columns (because they may have 
different null slots), am I right?
   
   Is it better to add a new function to support this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wgtmac commented on pull request #17877: PARQUET-2225:[C++][Parquet] Allow reading dense with RecordReader

Reply via email to