[ https://issues.apache.org/jira/browse/HIVE-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated HIVE-3253: -------------------------------- Attachment: HIVE-3253.2.patch HIVE-3253.2.patch - It increases the number of control charactors used by LazySimpleSerde, avoiding the chars that are likely to be present in data. Using new control chars is not backward compatible change, so you need to set the serde property hive.serialization.extend.nesting.levels to enable it for a table that is using LazySimpleSerde. If your input table has data that might contain these delimiter control chars, you should escape the delimiter chars, and set escape char using serde property. Example : {code} create table nestedcomplex ( simple_int int, max_nested_array array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true' ) ; {code} - LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of levels of nesting supported by the serde. We should look at using LazyBinarySerde here as it would be more efficient and can go beyond this nesting level restriction. - LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the levels of nesting using the new serde property for that use case. - The patch has fix to give better error message when the levels of nesting exceeds maximum supported levels (not an ArrayIndexOutOfBounds exception anymore) > ArrayIndexOutOfBounds exception for deeply nested structs > --------------------------------------------------------- > > Key: HIVE-3253 > URL: https://issues.apache.org/jira/browse/HIVE-3253 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.9.0, 0.10.0 > Reporter: Swarnim Kulkarni > Assignee: Travis Crawford > Attachments: HIVE-3253.2.patch, HIVE-3253_moar_nesting.1.patch, > jsonout.hive > > > It was observed that while creating table with deeply nested structs might > throw this exception: > {code} > java.lang.ArrayIndexOutOfBoundsException: 9 > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276) > at > org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354) > {code} > The reason being that currently the separators array has been hardcoded to be > of size 8 in the LazySimpleSerde. > {code} > // Read the separators: We use 8 levels of separators by default, but we > // should change this when we allow users to specify more than 10 levels > // of separators through DDL. > serdeParams.separators = new byte[8]; > {code} > If possible, we should increase this size or at least make it configurable to > properly handle deeply nested structs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira