Dandandan commented on code in PR #20820:
URL: https://github.com/apache/datafusion/pull/20820#discussion_r2933618085


##########
datafusion/datasource/src/file_stream.rs:
##########
@@ -39,30 +39,91 @@ use datafusion_physical_plan::metrics::{
 use arrow::record_batch::RecordBatch;
 use datafusion_common::instant::Instant;
 
+use crate::morsel::{FileOpenerMorselizer, Morsel, MorselPlanner, Morselizer};
 use futures::future::BoxFuture;
 use futures::stream::BoxStream;
-use futures::{FutureExt as _, Stream, StreamExt as _, ready};
+use futures::{FutureExt, Stream, StreamExt as _};
+
+/// How many planners can be active (performing I/O or producing morsels) at 
once for a given `FileStream`?
+///
+/// This setting controls the potential number of concurrent IOs.
+///
+/// Setting this to 1 means that the `FileStream` will only have one active
+/// planner at a time, and will not start opening the next file until the
+/// current file is fully processed. Setting this to a higher number allows the
+/// `FileStream` to start opening the next file while still processing the
+/// current file, which can improve performance by overlapping IO and CPU work.
+/// However, setting this too high may lead to more memory buffering and
+/// resource contention if there are too many concurrent IOs.
+///
+/// TODO make this a config option
+const TARGET_CONCURRENT_PLANNERS: usize = 2;
+
+/// Keep at most this many morsels buffered before pausing additional planning.
+///
+/// The default is one morsel per available core. The intent is that once work
+/// stealing is added, each other core can find at least one morsel to steal
+/// without requiring the scan to eagerly buffer an unbounded amount of work.
+///
+/// TODO make this a config option
+fn max_buffered_morsels() -> usize {

Review Comment:
   Hmmm it makes it more complex though (also need to implement some 
work-stealing strategy...) with little benefit (as the morsels themselves don't 
hold much data) there won't be a lot of contention/cross-communication anyway...
   
   Also I think it might be beneficial in certain cases (topk pruning) to 
execute the morsels in a predefined global order (e.g. for topk pruning) 
instead of per-partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to