Hi, I'm investigating https://issues.apache.org/jira/browse/ARROW-12513. While debugging, I've found that when we create dictionary_ https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_dict.cc#L111 we lose information about null_count. So data_->null_count != 0 but data_->dictionary->null_count == 0. Later we return an array without correct statistics. My question is this seems to be correct behaviour? Or do we need to return an array with statistics? Or these statistics should have been added to data_->dictionary somewhere else?
I wrote a more detailed explanation in the jira issue. -- Best regards, Kirill Lykov