alamb commented on code in PR #5020:
URL: https://github.com/apache/arrow-datafusion/pull/5020#discussion_r1083998194
##########
datafusion/core/src/physical_plan/file_format/file_stream.rs:
##########
@@ -268,7 +268,7 @@ impl<F: FileOpener> FileStream<F> {
if result.is_err() {
self.state = FileStreamState::Error
}
-
+ self.file_stream_metrics.time_scanning.start();
Review Comment:
The comment for `time_scanning` say
> /// Time elapsed for file scanning + first record batch of
decompression + decoding
Prior to this PR the timer is counting just the time spent until the data
starts being produced, but doesn't account for time while the data is being
produced.
If we want to change `time_scanning` to include the time while data is being
produced, I think we should update the comment as well.
What would you think about making 2 metrics:
```
/// Time elapsed for file scanning + first record batch of decompression
+ decoding
pub time_scanning_until_data: StartableTime,
/// Total elapsed time for for scanning + record batch decompression /
decoding
pub time_scanning_total: StartableTime,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]