Kontinuation commented on issue #15028:
URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2761940822

   > Excellent analysis folks. Parquet row groups size makes a lot of sense 
since the rows are large. We can tune that way down since our use case isn't 
columnar. How do I get the good errors? I see that was merged last year, but I 
was using the latest release at time of repro
   
   I made some modification to the code to use `TrackConsumersPool`. The error 
message will contain the amount of memory consumed by top 10 consumers when 
memory reservation failure happens.
   
   ```diff
   diff --git a/src/main.rs b/src/main.rs
   index 4413046..71af6d6 100644
   --- a/src/main.rs
   +++ b/src/main.rs
   @@ -1,5 +1,6 @@
    use std::sync::Arc;
   -use datafusion::execution::memory_pool::FairSpillPool;
   +use std::num::NonZeroUsize;
   +use datafusion::execution::memory_pool::{FairSpillPool, TrackConsumersPool};
    use datafusion::execution::runtime_env::RuntimeEnvBuilder;
    use datafusion::prelude::SessionConfig;
    use datafusion::prelude::SessionContext;
   @@ -12,7 +13,11 @@ use datafusion::logical_expr::col;
    #[tokio::main]
    async fn main() -> anyhow::Result<()> {
        env_logger::init();
   -    let pool = Arc::new(FairSpillPool::new(100 * 1024 * 1024));
   +    let pool = Arc::new(
   +        TrackConsumersPool::new(
   +            FairSpillPool::new(100 * 1024 * 1024),
   +            NonZeroUsize::new(10).unwrap()
   +        ));
        let runtime_env = RuntimeEnvBuilder::new() // TODO: add disk
           .with_memory_pool(pool.clone()) // TODO: from config
           .build_arc()
   @@ -33,6 +38,7 @@ async fn main() -> anyhow::Result<()> {
        table_opts.global.dictionary_enabled = Some(false);
        table_opts.global.statistics_enabled = Some("none".to_string());
        table_opts.global.bloom_filter_on_write = false;
   +    // table_opts.global.max_row_group_size = 1000;
    
        table_opts.column_specific_options.insert(
            "id".to_string(),
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to