westonpace commented on a change in pull request #10729:
URL: https://github.com/apache/arrow/pull/10729#discussion_r681470482
##########
File path: cpp/src/parquet/column_writer.cc
##########
@@ -1490,7 +1490,8 @@ Status TypedColumnWriterImpl<DType>::WriteArrowDictionary(
// TODO(wesm): If some dictionary values are unobserved, then the
Review comment:
Sorry for the delay. I've expanded the unit test to verify page level
statistics. It appears that we recompute the null count in `WriteIndicesChunk`
and do so based on `def_levels` which appears to be calculated on a completely
different code path (`MultipathLevelBuilderResult`). So the concern about
indices vs. dictionary does not apply and the null counts are correct.
I did notice we don't encode page level min/max stats at all. I'm not sure
if that is a bug or not (although, if so, I'd tackle that as a separate
PR/JIRA). So, if the unit test seems ok and the logic above seems valid then I
think this is good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]