[
https://issues.apache.org/jira/browse/HIVE-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thejas M Nair updated HIVE-3253:
--------------------------------
Attachment: HIVE-3253.2.patch
HIVE-3253.2.patch
- It increases the number of control charactors used by LazySimpleSerde,
avoiding the chars that are likely to be present in data. Using new control
chars is not backward compatible change, so you need to set the serde property
hive.serialization.extend.nesting.levels to enable it for a table that is using
LazySimpleSerde. If your input table has data that might contain these
delimiter control chars, you should escape the delimiter chars, and set escape
char using serde property.
Example :
{code}
create table nestedcomplex (
simple_int int,
max_nested_array
array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true'
)
;
{code}
- LazySimpleSerde is used by FileSyncOperator, that is why it was limited by
the number of levels of nesting supported by the serde. We should look at using
LazyBinarySerde here as it would be more efficient and can go beyond this
nesting level restriction.
- LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe
to extend the levels of nesting using the new serde property for that use case.
- The patch has fix to give better error message when the levels of nesting
exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception
anymore)
> ArrayIndexOutOfBounds exception for deeply nested structs
> ---------------------------------------------------------
>
> Key: HIVE-3253
> URL: https://issues.apache.org/jira/browse/HIVE-3253
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.9.0, 0.10.0
> Reporter: Swarnim Kulkarni
> Assignee: Travis Crawford
> Attachments: HIVE-3253.2.patch, HIVE-3253_moar_nesting.1.patch,
> jsonout.hive
>
>
> It was observed that while creating table with deeply nested structs might
> throw this exception:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 9
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> at
> org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
> {code}
> The reason being that currently the separators array has been hardcoded to be
> of size 8 in the LazySimpleSerde.
> {code}
> // Read the separators: We use 8 levels of separators by default, but we
> // should change this when we allow users to specify more than 10 levels
> // of separators through DDL.
> serdeParams.separators = new byte[8];
> {code}
> If possible, we should increase this size or at least make it configurable to
> properly handle deeply nested structs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira