[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845932#comment-13845932 ]
Sushanth Sowmyan commented on HIVE-6016: ---------------------------------------- Thanks for the correction, Prashanth, I've edited the bug report to remove that case. > Hadoop23Shims has a bug in listLocatedStatus impl. > -------------------------------------------------- > > Key: HIVE-6016 > URL: https://issues.apache.org/jira/browse/HIVE-6016 > Project: Hive > Issue Type: Bug > Components: Shims > Affects Versions: 0.13.0 > Reporter: Sushanth Sowmyan > Assignee: Prasanth J > Attachments: HIVE-6016.1.patch > > > Prashant and I discovered that the implementation of the wrapping Iterator in > listLocatedStatus at > https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 > is broken. > Basically, if you had files (a,b,_s) , with a filter that is supposed to > filter out _s, we expect an output result of (a,b). Instead, we get > (a,b,null), with hasNext looking at the next value to see if it's null, and > using that to decide if it has any more entries, and thus, (a,b,_s) becomes > (a,b). > There's a boundary condition on the very first pick, which causes a (_s,a,b) > to result in (_s,a,b), bypassing the filter, and thus, we wind up with a > resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)