data corruption with multi-table insert ---------------------------------------
Key: HIVE-1968 URL: https://issues.apache.org/jira/browse/HIVE-1968 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Joydeep Sen Sarma i had to run a conversion process to compute a checksum (sum(hash(all-columns)) of a table and convert it to a different compression format. trying to be clever - i did both of them in a single pass by doing something to the equivalent of: from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i insert overwrite table table_to_be_generated select i.col1, i.col2 insert overwrite table table_to_be_converted_checksum select sum(hash(i.val)); the plan looked correct. however - the data produced was erroneous - the checksums and the data were both wrong (and consistent with each other). i know this because: - the checksum computed by the above query didn't match the checksum on the input table when calculated separately - the checksum of the data output by this query (first insert clause) didn't match the input table's checksum (neither the one computed by the query above, nor by the one computed separately) later on - i broke up this query into two independent ones - and the data and checksums were good (ie. they all matched up). so seems like there's some data corruption happening in MTI. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira