timsaucer commented on issue #1551: URL: https://github.com/apache/datafusion-python/issues/1551#issuecomment-4487543692
Ok, it looks like this is happening in the provider not in datafusion-python library as I had originally expected. What is happening is that we have a `Column` physical expression that is failing to simplify because `simplify_const_expr_immediate` cannot correctly downcast it to `Column` since it's foreign at that point. This was an unintended side effect of https://github.com/apache/datafusion/pull/18916 Here is what is happening: - The table provider is getting a push down filter expression (logical) and a '&dyn Session' that is the `datafusion-python` session context. - The `ParquetSource` it is using under the hood takes a `PhysicalExpr` as it's predicate. The `PhysicalExpr` is created by the `datafusion-python` session context during the call to `create_physical_expr`. This PhysicalExpr originates in the `datafusion-python` library so it is foreign in terms of the user library. That is, from the user library perspective it will be a `ForeignPhysicalExpr` - On the user library we are getting calls to `simplify`. It looks like this happens both in open and in row group filter (which is where we're hitting it here). - `simplify` checks to see if it is a `Column` during `simplify_const_expr_immediate` however it cannot downcast to `Column` because we are in the user library NOT the main datafusion-python library when this simplify gets called. Here is a work around I have tested with this code ``` let execution_props = ExecutionProps::new(); let predicate = predicate .map(|predicate| { datafusion::physical_expr::create_physical_expr( &predicate, &df_schema, &execution_props, ) }) .transpose()? // if there are no filters, use a literal true to have a predicate // that always evaluates to true we can pass to the index .unwrap_or_else(|| datafusion::physical_expr::expressions::lit(true)); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
