wgtmac commented on code in PR #35989:
URL: https://github.com/apache/arrow/pull/35989#discussion_r1231810987


##########
cpp/src/parquet/statistics.cc:
##########
@@ -472,34 +474,40 @@ class TypedStatisticsImpl : public TypedStatistics<DType> 
{
     comparator_ = std::static_pointer_cast<TypedComparator<DType>>(comp);
     TypedStatisticsImpl::Reset();
     has_null_count_ = true;
-    has_distinct_count_ = true;
+    has_distinct_count_ = false;

Review Comment:
   CMIW, `null_count_` can be safely accumulated and merged, so 
`has_null_count_` is set to true and `null_count_` is set to 0 at its initial 
stage. During the building process, `has_null_count_` should always be true 
unless something wrong happens.
   
   On the contrary, `distinct_count_` cannot be merged and actually it is not 
set in the building process. It makes sense to set it to false at its initial 
stage.
   
   I think @mapleFU means `distinct_count_` cannot be merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to