Benyi Wang created SQOOP-1714:
---------------------------------
Summary: DateSplitter makes wrong splits
Key: SQOOP-1714
URL: https://issues.apache.org/jira/browse/SQOOP-1714
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.4.4
Environment: CentOS 6.4 CDH-5.1.0
Reporter: Benyi Wang
If the split-by column is a Date type, Sqoop will send a query to read
Min(Date) and Max(Date), those two values are passed to DateSplitter.
DateSplitter converts those values into long, and does a split using
num-mappers. But this method is wrong. If min(Date) and max(Date) are
2013-09-26 and 2013-09-28, how many days do we have? 3 days. But if 2013-09-28
as a java.sql.Date#getTIme will returns the value actually is (2013-09-28
00:00:00), the maxVal - minVal has only two days.
I encountered this issue when I tried to import a Teradata table: Given date
between 2013-09-26 and 2013-09-28, and num-mappers=3, there are 3 tasks, the
conditions are
# date >= 2013-09-26 and date < 2013-09-26;
# date >=2013-09-26 and date < 2013-09-27,
# date >= 2013-09-27 and date <= 2013-09-28
The first one has nothing, and the last one has two days.
Because the difference of the minVal and maxVal is two days (24*2*3600*1000),
the split size will be 2/3 day, when it is converted back to Date, it will be
still 2013-09-26, that's why the first partition is wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)