Dandandan commented on code in PR #20916:
URL: https://github.com/apache/datafusion/pull/20916#discussion_r2927684360


##########
datafusion/datasource/src/file_stream.rs:
##########
@@ -125,35 +120,10 @@ impl FileStream {
                 FileStreamState::Open { future } => match 
ready!(future.poll_unpin(cx)) {
                     Ok(reader) => {
                         self.file_stream_metrics.files_opened.add(1);
-                        // include time needed to start opening in 
`start_next_file`
                         self.file_stream_metrics.time_opening.stop();
-                        let next = {
-                            let scanning_total_metric = self
-                                .file_stream_metrics
-                                .time_scanning_total
-                                .metrics
-                                .clone();
-                            let _timer = scanning_total_metric.timer();
-                            self.start_next_file().transpose()

Review Comment:
   So this @alamb is what I was mostly talking about. It will read the footer 
(what we want) but AFAIK also:
   * build the pruning predicate (I think this is suboptimal, too early)
   * prune row groups
   * optionally load the page index 
   * return the stream (without driving **that**  forward)
   
   We should be able to do this much better with the IO / CPU separation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to