alamb opened a new issue, #18083: URL: https://github.com/apache/datafusion/issues/18083
This is broken out from https://github.com/apache/datafusion/pull/17958. See the other related discussion on that PR Basically the question is what to do with a "bad" hive partitioned value ``` /table/year=2002/foo.parquet /table/year=2003/bar.parquet /table/baz.parquet # <-- what partition does this belong to? /table/year=/baz.parquet # <-- likewise, what if the value is empty string? ``` For the filtering case, it looks like we do not infer the filter for `<partition column> IS NULL`, and the column is defined as [not nullable when the listing table is built](https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/listing/table.rs#L996). We should probably fix this to support null columns, but we'll need to introduce some configuration for users to specify the null fallback value outside of the default `__HIVE_DEFAULT_PARTITION__`. For the non-filtering case, this PR will indeed match null partition column values in the current implementation - but because we don't have any special treatment of them, it would return as the literal text `"__HIVE_DEFAULT_PARTITION__"` for example. If your partition column is then set as an `Int32` for example, the query will fail. I think implementing proper support for the nulls will need more work outside of this PR. Because we already define the column as non-nullable, what do you think about manually excluding `__HIVE_DEFAULT_PARTITION__` values from the `parse_partitions_for_path` to prevent query errors like the one I describe above, until proper support for nulls is added? I can raise an issue and start working on it as well. This won't help people with custom null fallback values, but would help for all default cases. _Originally posted by @peasee in https://github.com/apache/datafusion/pull/17958#discussion_r2422243637_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
