devinjdangelo commented on issue #9785: URL: https://github.com/apache/arrow-datafusion/issues/9785#issuecomment-2020596780
> I'm not sure what to do next regarding this. Do we just prevent the panic and leave it at that? do we perform schema validation if the table exists? We should prevent the panic, but I think it is reasonable behavior to fail the query and return an error. The examples in this issue are creating two tables with different schemas in the same external storage location and writing to both. This effectively corrupts both tables, since now there are two different table definitions mixed in the same physical location. After the inserts, the directory looks like: ``` /year=2024/month=03 /month=2024/year=03 ``` When the datafusion read path encounters this, the two options would be: 1. Silently ignore hive style paths which do not conform to the expected schema (so if the partitions are year/month, only scan within the outer year folder, ignoring the outer month folder) 2. Return an error that unexpected paths were encountered Depending on the context, I could see either behavior being desired. We could perhaps provide a configuration option that allows users to control how Datafusion will handle unexpected partition paths. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
