yordan-pavlov commented on issue #1111:
URL: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003224774


   here is what I've found so far:
   * there is a test for plain-encoded strings which is working with null 
values and across pages here 
https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L1493
   * `VariableLenPlainDecoder` gets used in the above test and it does work 
correctly, because although the value of `num_values`  for the decoder does 
include NULLs, it stops reading from the page correctly because it checks that 
it doesn't read out of the values buffer ( `while self.position < data_len` ) 
here 
https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L919
   * what is missing is a test that exercises the `VariableLenDictionaryDecoder`
   * the `VariableLenDictionaryDecoder` relies on the `RleDecoder` to not read 
out of its buffer and to return 0 when no more values can be read here 
https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L1069
   
   it's getting pretty late now, but tomorrow I will try to write the missing 
test (that doesn't rely on an external parquet file) to reproduce the issue 
with `VariableLenDictionaryDecoder` / `RleDecoder` and also think on a 
short-term fix  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to