Yulei Yang created SQOOP-3263: --------------------------------- Summary: Duplicate rows found when split-by column is of textual type due to different charset difference of sqoop and hadoop Key: SQOOP-3263 URL: https://issues.apache.org/jira/browse/SQOOP-3263 Project: Sqoop Issue Type: Bug Affects Versions: 1.4.6 Reporter: Yulei Yang
This is issue can be found in any kind of RMDBS, because the root cause is not on RMDBS. Steps to reproduce this issue: 1. create a mysql table: create table ora_test (id varchar(32) primary key not null); 2. insert *4* rows: insert into ora_test values ('08125FC4C8FDA064E053C0A8028DA064'); insert into ora_test values ('4FFE68419D3502E2E0537F000001F3E8'); insert into ora_test values ('4FFF9CF5861E003EE0537F0000017FF7'); insert into ora_test values ('56DAC2D0F14901B0E0537F000001D3FA'); 3. import it to hive with sqoop import -m 32. (m=189 is also ok)。 Then you will get *6* rows in hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)