berkaysynnada commented on code in PR #7793:
URL: https://github.com/apache/arrow-datafusion/pull/7793#discussion_r1359639622


##########
datafusion/core/src/datasource/statistics.rs:
##########
@@ -33,101 +36,143 @@ pub async fn get_statistics_with_limit(
     limit: Option<usize>,
 ) -> Result<(Vec<PartitionedFile>, Statistics)> {
     let mut result_files = vec![];
-
-    let mut null_counts = vec![0; file_schema.fields().len()];
-    let mut has_statistics = false;
-    let (mut max_values, mut min_values) = create_max_min_accs(&file_schema);
-
-    let mut is_exact = true;
+    let mut null_counts: Option<Vec<Precision<usize>>> = None;
+    let mut max_values: Option<Vec<Precision<ScalarValue>>> = None;
+    let mut min_values: Option<Vec<Precision<ScalarValue>>> = None;
 
     // The number of rows and the total byte size can be calculated as long as
     // at least one file has them. If none of the files provide them, then they
     // will be omitted from the statistics. The missing values will be counted
     // as zero.
-    let mut num_rows = None;
-    let mut total_byte_size = None;
+    let mut num_rows: Option<Precision<usize>> = None;

Review Comment:
   Initially, I wrote this part as you mentioned. However, I encountered a 
problem: I want to directly copy the information from the first file without 
altering its precision. If there are absent or inexact values in the subsequent 
files, I then convert the precision to inexact, and they can never be exact 
again. I put these values into Option to apply this initial change within the 
while loop. 
   
   The way you suggested, even if every statistics read from the file is exact, 
the statistics we accumulate can never be exact because we started them as 
absent. I couldn't come up with a smoother solution, but you are correct that 
this part of the code is somewhat complicated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to