alamb commented on code in PR #20820:
URL: https://github.com/apache/datafusion/pull/20820#discussion_r2932987687


##########
datafusion/datasource/src/file_stream.rs:
##########
@@ -39,30 +39,91 @@ use datafusion_physical_plan::metrics::{
 use arrow::record_batch::RecordBatch;
 use datafusion_common::instant::Instant;
 
+use crate::morsel::{FileOpenerMorselizer, Morsel, MorselPlanner, Morselizer};
 use futures::future::BoxFuture;
 use futures::stream::BoxStream;
-use futures::{FutureExt as _, Stream, StreamExt as _, ready};
+use futures::{FutureExt, Stream, StreamExt as _};
+
+/// How many planners can be active (performing I/O or producing morsels) at 
once for a given `FileStream`?
+///
+/// This setting controls the potential number of concurrent IOs.
+///
+/// Setting this to 1 means that the `FileStream` will only have one active
+/// planner at a time, and will not start opening the next file until the
+/// current file is fully processed. Setting this to a higher number allows the
+/// `FileStream` to start opening the next file while still processing the
+/// current file, which can improve performance by overlapping IO and CPU work.
+/// However, setting this too high may lead to more memory buffering and
+/// resource contention if there are too many concurrent IOs.
+///
+/// TODO make this a config option
+const TARGET_CONCURRENT_PLANNERS: usize = 2;
+
+/// Keep at most this many morsels buffered before pausing additional planning.
+///
+/// The default is one morsel per available core. The intent is that once work
+/// stealing is added, each other core can find at least one morsel to steal
+/// without requiring the scan to eagerly buffer an unbounded amount of work.
+///
+/// TODO make this a config option
+fn max_buffered_morsels() -> usize {

Review Comment:
   I was thinking the less cross-core communication the better (so better to 
have per-thread work).
   
   Though I do think the number of outstanding IOs makes more sense as some 
sort of global / per session setting... Though I am still thinking work 
stealing is a better strategy. I'll see what I can come up with



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to