[ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397917#comment-15397917 ]
Illya Yalovyy commented on HIVE-7239: ------------------------------------- The build page [1] shows 4 failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_avro_non_nullable_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert 3 of them have been failing for a while. TestCliDriver.testCliDriver_list_bucket_dml_13 is irrelevant to this patch. Please suggest the next step to get this patch accepted. 1. https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-MASTER-Build/674/#showFailuresLink > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query > result when input backed by Sequence/RC files > -------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing > Affects Versions: 2.1.0 > Reporter: Sumit Kumar > Assignee: Illya Yalovyy > Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.4.patch, > HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the > boundaries enforced by the input sequence file. However by default hadoop > creates input splits depending on the configuration parameters which may not > match the boundaries for the input sequence file. Hive provides > HiveIndexedInputFormat that provides extra logic and recalculates the split > boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by > sequence file. We've a sample data on which we experimented and fixed this > bug, we have verified this fix by comparing the query output for input being > sequence file format, rc file and regular format. However we have not able to > find the right place to include this as a unit test that would execute as > part of hive tests. We tried writing a "clientpositive" test as part of ql > module but the output seems quite verbose and i couldn't interpret it that > well. Can someone please review this change and guide on how to write a test > that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.3.4#6332)