Arup Malakar created HIVE-9960:
----------------------------------

             Summary: Hive not backward compatibilie while adding optional new 
field to struct in parquet files
                 Key: HIVE-9960
                 URL: https://issues.apache.org/jira/browse/HIVE-9960
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
            Reporter: Arup Malakar


I recently added an optional field to a struct, when I tried to query old data 
with the new hive table which has the new field as column it throws error. Any 
clue how I can make it backward compatible so that I am still able to query old 
data with the new table definition.
 
I am using hive-0.14.0 release with  HIVE-8909 patch applied.

Details:

New optional field in a struct
{code}
struct Event {
    1: optional Type type;
    2: optional map<string, string> values;
    3: optional i32 num = -1; // <--- New field
}
{code}

Main thrift definition
{code}
 10: optional list<Event> events;
{code}

Corresponding hive table definition
{code}
  events array< struct <type: string, values: map<string, string>, num: int>>)
{code}

Try to read something from the old data, using the new table definition
{{select events from table1 limit 1;}}

Failed with exception:
{code}
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: 2                                   

Error thrown:                                       

15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: 2                               

java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: 2                                     
                                                          

        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)   
                                                                                
                                                 

        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)        
                                                                                
                                                 

        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)        
                                                                                
                                      

        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)  
                                                                                
                                                 

        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) 
                                                                                
                                                 

        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)          
                                                                                
                                      

        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)         
                                                                                
                                                 

        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)        
                                                                                
                                                 

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)          
                                                                                
                                                 

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)   
                                                                                
                                      

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                                                                                
                                 

        at java.lang.reflect.Method.invoke(Method.java:597)                     
                                                                                
                                                 

        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)                  
                                                                                
                                                 

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: 2                                     
                                                                    

        at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
                                                                                
                                   

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)   
                                                                                
                                                 

        at 
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)   
                                                                                
                                      

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)   
                                                                                
                                                 

        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) 
                                                                                
                                      

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)   
                                                                                
                                                 

        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
                                                                                
                                 

        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:571)    
                                                                                
                                      

        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:563)    
                                                                                
                                      

        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)   
                                                                                
                                                 

        ... 12 more                                                             
                                                                                
                                                 

Caused by: java.lang.ArrayIndexOutOfBoundsException: 2                          
                                                                                
                                                 

        at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:140)
                                                                     

        at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)   
                                                                                
                                      

        at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:306)   
                                                                                
                                      

        at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)     
                                                                                
                                      

        at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:60)
                                                                                
                           

        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:422)
                                                                                
                              

        at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
                                                                                
                

        at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
                                                                                
                           

        at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
                                                                                
                           

        at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
                                                                                
                                   

        ... 21 more 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to