eye created HIVE-5590:
-------------------------
Summary: select and get duplicated records with hive when a
.defalte file greater than 64MB was loaded to a hive table
Key: HIVE-5590
URL: https://issues.apache.org/jira/browse/HIVE-5590
Project: Hive
Issue Type: Bug
Environment: cdh4
Reporter: eye
we occasionally have some compressed file larger than 160MB in .deflate format.
And it was load to hive using an external table, say table T_A.
when select count(*) from T_A we got more records,70% more! compared with that
we use "hadoop fs -text /xxxxx |wc -l" to check the file.
any clue for this?
the large .deflate file was due to imperfect processing , when we fixed it and
get files less than 64M. the above problem did not come up. But since it is not
guaranteed that a larger file would not show up again. is there any way to
avoid this subject ?
cheers!
eye
--
This message was sent by Atlassian JIRA
(v6.1#6144)