Omega359 commented on PR #22680:
URL: https://github.com/apache/datafusion/pull/22680#issuecomment-4605776566

   > > Note that the IMDB_FILE_TYPE=csv will OOM on most systems because csv 
doesn't infer statistics and thus won't get scan predicates and dynamic filters 
pushed into DataSourceExec. This results in queries such a 16a doing joining 
large tables/intermediates before enough of the selective filters have reduced 
the data size to not OOM (tested on a 96GB system). Setting PARTITION=1 does 
not solve the issue.
   > 
   > I assume this was already the case? Thanks for investigating the root 
cause.
   
   Yes, though the bench.sh only runs the parquet version, not the csv one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to