[ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sumit Kumar updated HIVE-7239: ------------------------------ Attachment: HIVE-7239.patch Please review and recommend a way to test this patch as part of hive unit tests/cli tests/otherwise > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query > result when input backed by Sequence/RC files > -------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing > Affects Versions: 0.13.1 > Reporter: Sumit Kumar > Assignee: Sumit Kumar > Attachments: HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the > boundaries enforced by the input sequence file. However by default hadoop > creates input splits depending on the configuration parameters which may not > match the boundaries for the input sequence file. Hive provides > HiveIndexedInputFormat that provides extra logic and recalculates the split > boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by > sequence file. We've a sample data on which we experimented and fixed this > bug, we have verified this fix by comparing the query output for input being > sequence file format, rc file and regular format. However we have not able to > find the right place to include this as a unit test that would execute as > part of hive tests. We tried writing a "clientpositive" test as part of ql > module but the output seems quite verbose and i couldn't interpret it that > well. Can someone please review this change and guide on how to write a test > that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.2#6252)