Benyi Wang created SQOOP-1714:
---------------------------------

             Summary: DateSplitter makes wrong splits
                 Key: SQOOP-1714
                 URL: https://issues.apache.org/jira/browse/SQOOP-1714
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.4
         Environment: CentOS 6.4 CDH-5.1.0
            Reporter: Benyi Wang


If the split-by column is a Date type, Sqoop will send a query to read 
Min(Date) and Max(Date), those two values are passed to DateSplitter. 
DateSplitter converts those values into long, and does a split using 
num-mappers. But this method is wrong. If min(Date) and max(Date) are 
2013-09-26 and 2013-09-28, how many days do we have? 3 days. But if 2013-09-28 
as a java.sql.Date#getTIme will returns the value actually is (2013-09-28 
00:00:00), the maxVal - minVal has only two days. 

I encountered this issue when I tried to import a Teradata table: Given date 
between 2013-09-26 and 2013-09-28, and num-mappers=3, there are 3 tasks, the 
conditions are
# date >= 2013-09-26 and date < 2013-09-26;
# date >=2013-09-26 and date < 2013-09-27, 
# date >= 2013-09-27 and date <= 2013-09-28
The first one has nothing, and the last one has two days.

Because the difference of the minVal and maxVal is two days (24*2*3600*1000), 
the split size will be 2/3 day, when it is converted back to Date, it will be 
still 2013-09-26, that's why the first partition is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to