westonpace commented on issue #15042:
URL: https://github.com/apache/arrow/issues/15042#issuecomment-1369130318
It appears the null count is also wrong. From a glance, the dictionary
writing path (`WriteArrowDictionary` in `column_writer.cc`) looks something
like this:
```
if (AlreadyHaveDictionaryForColumn()) {
if (IsDictionaryChanged()) {
# Fallback to plain or maybe unify or something
} else {
# Write another indices batch
}
} else {
# Calculate statistics / null count and setup column and store dictionary
for future write to column
}
```
So I suppose the behavior makes sense given the above algorithm. We need to
add "update null count and potentially update min/max" to that middle branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]