alamb commented on code in PR #5020:
URL: https://github.com/apache/arrow-datafusion/pull/5020#discussion_r1083998194


##########
datafusion/core/src/physical_plan/file_format/file_stream.rs:
##########
@@ -268,7 +268,7 @@ impl<F: FileOpener> FileStream<F> {
                         if result.is_err() {
                             self.state = FileStreamState::Error
                         }
-
+                        self.file_stream_metrics.time_scanning.start();

Review Comment:
   The comment for `time_scanning` say
   
   >     /// Time elapsed for file scanning + first record batch of 
decompression + decoding
   
   Prior to this PR the timer is counting just the time spent until the data 
starts being produced, but doesn't account for time while the data is being 
produced.
   
   If we want to change `time_scanning` to include the time while data is being 
produced, I think we should update the comment as well.
   
   What would you think about making 2 metrics:
   
   ```
       /// Time elapsed for file scanning + first record batch of decompression 
+ decoding
       pub time_scanning_until_data: StartableTime,
   
       /// Total elapsed time for for scanning + record batch decompression / 
decoding
       pub time_scanning_total: StartableTime,
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to