alamb opened a new issue, #18083:
URL: https://github.com/apache/datafusion/issues/18083

   This is broken out from https://github.com/apache/datafusion/pull/17958. See 
the other related discussion on that PR
   
   Basically the question is what to do with a "bad" hive partitioned value
   
   ```
   /table/year=2002/foo.parquet
   /table/year=2003/bar.parquet
   /table/baz.parquet  # <-- what partition does this belong to?
   /table/year=/baz.parquet  # <-- likewise, what if the value is empty string?
   ```
   
   
   
   
   For the filtering case, it looks like we do not infer the filter for 
`<partition column> IS NULL`, and the column is defined as [not nullable when 
the listing table is 
built](https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/listing/table.rs#L996).
 We should probably fix this to support null columns, but we'll need to 
introduce some configuration for users to specify the null fallback value 
outside of the default `__HIVE_DEFAULT_PARTITION__`.
   
   For the non-filtering case, this PR will indeed match null partition column 
values in the current implementation - but because we don't have any special 
treatment of them, it would return as the literal text 
`"__HIVE_DEFAULT_PARTITION__"` for example. If your partition column is then 
set as an `Int32` for example, the query will fail.
   
   I think implementing proper support for the nulls will need more work 
outside of this PR. Because we already define the column as non-nullable, what 
do you think about manually excluding `__HIVE_DEFAULT_PARTITION__` values from 
the `parse_partitions_for_path` to prevent query errors like the one I describe 
above, until proper support for nulls is added? I can raise an issue and start 
working on it as well.
   
   This won't help people with custom null fallback values, but would help for 
all default cases.
   
   _Originally posted by @peasee in 
https://github.com/apache/datafusion/pull/17958#discussion_r2422243637_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to