westonpace commented on a change in pull request #10729:
URL: https://github.com/apache/arrow/pull/10729#discussion_r681470482



##########
File path: cpp/src/parquet/column_writer.cc
##########
@@ -1490,7 +1490,8 @@ Status TypedColumnWriterImpl<DType>::WriteArrowDictionary(
     // TODO(wesm): If some dictionary values are unobserved, then the

Review comment:
       Sorry for the delay.  I've expanded the unit test to verify page level 
statistics.  It appears that we recompute the null count in `WriteIndicesChunk` 
and do so based on `def_levels` which appears to be calculated on a completely 
different code path (`MultipathLevelBuilderResult`).  So the concern about 
indices vs. dictionary does not apply and the null counts are correct.
   
   I did notice we don't encode page level min/max stats at all.  I'm not sure 
if that is a bug or not (although, if so, I'd tackle that as a separate 
PR/JIRA).  So, if the unit test seems ok and the logic above seems valid then I 
think this is good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to