[ 
https://issues.apache.org/jira/browse/HIVE-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-3253:
--------------------------------

    Attachment: HIVE-3253.2.patch

HIVE-3253.2.patch 
- It increases the number of control charactors used by LazySimpleSerde, 
avoiding the chars that are likely to be present in data. Using new control 
chars is not backward compatible change, so you need to set the serde property 
hive.serialization.extend.nesting.levels to enable it for a table that is using 
LazySimpleSerde. If your input table has data that might contain these 
delimiter control chars, you should escape the delimiter chars, and set escape 
char using serde property.

Example :
{code}
create table nestedcomplex (
simple_int int,
max_nested_array  
array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (  'hive.serialization.extend.nesting.levels'='true'
)
;
{code}

- LazySimpleSerde is used by FileSyncOperator, that is why it was limited by 
the number of levels of nesting supported by the serde. We should look at using 
LazyBinarySerde here as it would be more efficient and can go beyond this 
nesting level restriction.

- LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe 
to extend the levels of nesting using the new serde property for that use case.

- The patch has fix to give better error message when the levels of nesting 
exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception 
anymore)
                
> ArrayIndexOutOfBounds exception for deeply nested structs
> ---------------------------------------------------------
>
>                 Key: HIVE-3253
>                 URL: https://issues.apache.org/jira/browse/HIVE-3253
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0, 0.10.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Travis Crawford
>         Attachments: HIVE-3253.2.patch, HIVE-3253_moar_nesting.1.patch, 
> jsonout.hive
>
>
> It was observed that while creating table with deeply nested structs might 
> throw this exception:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 9
>         at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
>       at 
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
> {code}
> The reason being that currently the separators array has been hardcoded to be 
> of size 8 in the LazySimpleSerde.
> {code}
> // Read the separators: We use 8 levels of separators by default, but we
> // should change this when we allow users to specify more than 10 levels
> // of separators through DDL.
> serdeParams.separators = new byte[8];
> {code}
> If possible, we should increase this size or at least make it configurable to 
> properly handle deeply nested structs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to