[
https://issues.apache.org/jira/browse/HIVE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030360#comment-13030360
]
Ning Zhang commented on HIVE-1968:
----------------------------------
@Joydeep, Yongqiang and I were trying to reproduce the bug but couldn't. We
tried different query patterns (1 map-only job + 1 mapreduce job, and dynamic
partition inserts) and on small & large data sets. All these worked as
expected. So without a concrete example it's very hard to say it is a bug in
multi-table inserts. Do you have any chance to dig into your query log and find
out the specific query?
> data corruption with multi-table insert
> ---------------------------------------
>
> Key: HIVE-1968
> URL: https://issues.apache.org/jira/browse/HIVE-1968
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.7.0
> Reporter: Joydeep Sen Sarma
>
> i had to run a conversion process to compute a checksum
> (sum(hash(all-columns)) of a table and convert it to a different compression
> format. trying to be clever - i did both of them in a single pass by doing
> something to the equivalent of:
> from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
> insert overwrite table table_to_be_generated select i.col1, i.col2
> insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));
> the plan looked correct. however - the data produced was erroneous - the
> checksums and the data were both wrong (and consistent with each other). i
> know this because:
> - the checksum computed by the above query didn't match the checksum on the
> input table when calculated separately
> - the checksum of the data output by this query (first insert clause) didn't
> match the input table's checksum (neither the one computed by the query
> above, nor by the one computed separately)
> later on - i broke up this query into two independent ones - and the data and
> checksums were good (ie. they all matched up). so seems like there's some
> data corruption happening in MTI.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira