[GitHub] [orc] dongjoon-hyun commented on pull request #972: ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release

GitBox Wed, 15 Dec 2021 22:46:51 -0800


dongjoon-hyun commented on pull request #972:
URL: https://github.com/apache/orc/pull/972#issuecomment-995488966



   Is it okay when we handle multiple ORC files in Hive schema evolutions?
   > I was thinking that throwing an exception when a SArg column is not found 
is probably a better approach than just logging.
   
   In many cases, Hive partitions might have different schemas. The simplest 
case is having new columns additionally in new partitions. If a user run a 
query for all partitions, SArg columns can have new columns which old 
partitions don't have.
   
   Apache Spark checks the physical schema when we open a file and try to 
adjust the missing columns. Given this PR's description, Apache Hive doesn't, 
right?
   
   For me, throwing an exception could be too intrusive and the AS-IS status 
(`-1`) would be enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] dongjoon-hyun commented on pull request #972: ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release

Reply via email to