Re: [I] [BUG] Panic when querying table with wrong partition columns order [arrow-datafusion]

via GitHub Tue, 26 Mar 2024 07:32:32 -0700


devinjdangelo commented on issue #9785:
URL: 
https://github.com/apache/arrow-datafusion/issues/9785#issuecomment-2020596780


   > I'm not sure what to do next regarding this. Do we just prevent the panic 
and leave it at that? do we perform schema validation if the table exists?
   
   We should prevent the panic, but I think it is reasonable behavior to fail 
the query and return an error. The examples in this issue are creating two 
tables with different schemas in the same external storage location and writing 
to both. This effectively corrupts both tables, since now there are two 
different table definitions mixed in the same physical location. After the 
inserts, the directory looks like:
   
   ```
   /year=2024/month=03
   /month=2024/year=03
   ```
   When the datafusion read path encounters this, the two options would be:
   
   1. Silently ignore hive style paths which do not conform to the expected 
schema (so if the partitions are year/month, only scan within the outer year 
folder, ignoring the outer month folder)
   2. Return an error that unexpected paths were encountered
   
   Depending on the context, I could see either behavior being desired. We 
could perhaps provide a configuration option that allows users to control how 
Datafusion will handle unexpected partition paths.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [BUG] Panic when querying table with wrong partition columns order [arrow-datafusion]

Reply via email to