[
https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615020#comment-13615020
]
Navis commented on HIVE-4223:
-----------------------------
Can I ask that the query which made above exception is using UDTF?
> LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of
> hive table
> ------------------------------------------------------------------------------------
>
> Key: HIVE-4223
> URL: https://issues.apache.org/jira/browse/HIVE-4223
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.9.0
> Environment: Hive 0.9.0
> Reporter: Yong Zhang
>
> The LazySimpleSerDe will throw IndexOutOfBoundsException if the column
> structure is struct containing array of struct.
> I have a table with one column defined like this:
> columnA
> array <
> struct<
> col1:primiType,
> col2:primiType,
> col3:primiType,
> col4:primiType,
> col5:primiType,
> col6:primiType,
> col7:primiType,
> col8:array<
> struct<
> col1:primiType,
> col2::primiType,
> col3::primiType,
> col4:primiType,
> col5:primiType,
> col6:primiType,
> col7:primiType,
> col8:primiType,
> col9:primiType
> >
> >
> >
> >
> In this example, the outside struct has 8 columns (including the array), and
> the inner struct has 9 columns. As long as the outside struct has LESS column
> count than the inner struct column count, I think we will get the following
> exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
> Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
> at
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
> at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
> at
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
> ... 9 more
> I am not very sure about exactly the reason of this problem. I believe that
> the public static void serialize(ByteStream.Output out, Object
> obj,ObjectInspector objInspector, byte[] separators, int level, Text
> nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is
> recursively invoking itself when facing nest structure. But for the nested
> struct structure, the list reference will mass up, and the size() will return
> wrong data.
> In the above example case I faced,
> for these 2 lines:
> List<? extends StructField> fields = soi.getAllStructFieldRefs();
> list = soi.getStructFieldsDataAsList(obj);
> my StructObjectInspector(soi) will return the CORRECT data for
> getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example,
> for one row, for the outsider 8 columns struct, I have 2 elements in the
> inner array of struct, and each element will have 9 columns (as there are 9
> columns in the inner struct). During runtime, after I added more logging in
> the LazySimpleSerDe, I will see the following behavior in the logging:
> for 8 outside column, loop
> for 9 inside columns, loop for serialize
> for 9 inside columns, loop for serialize
> code broken here, for the outside loop, it will try to access the 9th
> element,which not exist in the outside loop, as you will see the stracktrace
> as it tried to access location 8 of size 8 of list.
> What I did is to change the following line of code, it look like fixing this
> problem. But I don't know if it is the right way, but it did fix this
> problem, and I did it on hive 0.9.0 version of code:
> 481c481,482
> < for (int i = 0; i < list.size(); i++) {
> ---
> > int listSize = list.size();
> > for (int i = 0; i < listSize; i++) {
> I believe the reason of this bug is that if the code did the current way like
> for (int i = 0; i < list.size(); i++)
> the method list.size() will be invoked for every loop. But in the nest
> structure, the list.size() will return different result during the recursive
> call, and that caused the problem I am currently facing.
> Thanks
> Yong Zhang
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira