alamb commented on issue #4028:
URL: https://github.com/apache/datafusion/issues/4028#issuecomment-2271869472

   ## Background Reading
   
   Here is a background article about parquet predicate pushdown:  
https://www.influxdata.com/blog/querying-parquet-millisecond-latency/#heading5
   
   Specifically the section on `Late materialization` describes what is done in 
the RowFilter code 
   
   ## Code links for Listing Table
   
   Supports filter pushdown: 
https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html#impl-TableProvider-for-ListingTable
   
   ## Conditions:
   
   Here are the conditions @tustvold  lists above, and some links about what 
they mean
   
   > Parquet predicate pushdown is enabled
   
   This refers to the `datafusion.execution.parquet.pushdown_filters` 
configuration setting ([docs here)
   ](https://datafusion.apache.org/user-guide/configs.html)
   
   > The FileFormat is parquet
   
   This should be relatively straightforward to determine
   
   
   > The predicate is fully pushed down by ParquetExec (not all predicates are 
supported)
   
   "pushdown" refers to this code in ParquetExec (there is some background 
information in the parquet [RowFilter 
API](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowFilter.html))
   
   
https://github.com/apache/datafusion/blob/16a3557325eb8f949f4a87ab90c0a0b174dc8d86/datafusion/core/src/datasource/physical_plan/parquet/row_filter.rs#L43-L70


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to