kumarUjjawal commented on code in PR #22900:
URL: https://github.com/apache/datafusion/pull/22900#discussion_r3407595454


##########
datafusion/datasource-csv/src/source.rs:
##########
@@ -389,6 +391,7 @@ impl FileOpener for CsvOpener {
             let result = store
                 .get_opts(&partitioned_file.object_meta.location, options)
                 .await?;
+            bytes_scanned.add((result.range.end - result.range.start) as 
usize);

Review Comment:
   For ranged CSV scans, calculate_range may do extra object-store reads to 
find newline boundaries before this point. In TerminateEarly, it can fetch 
bytes and still report 0. If the metric means “bytes fetched from object 
store”, count those boundary reads too



##########
datafusion/datasource-csv/src/source.rs:
##########
@@ -364,6 +364,8 @@ impl FileOpener for CsvOpener {
 
         let baseline_metrics =
             BaselineMetrics::new(&self.config.metrics, self.partition_index);
+        let bytes_scanned = 
datafusion_physical_plan::metrics::MetricBuilder::new(&self.config.metrics)

Review Comment:
   EXPLAIN ANALYZE with LEVEL summary or METRICS 'bytes' can hide the new byte 
metric. Mark it using MetricType::Summary and MetricCategory::Bytes, and add a 
CSV explain test for bytes-only metrics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to