[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

Hudson (JIRA) Thu, 13 Oct 2016 07:26:47 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572070#comment-15572070
 ]


Hudson commented on SQOOP-2986:
-------------------------------

SUCCESS: Integrated in Jenkins build Sqoop-hadoop200 #1067 (See 
[https://builds.apache.org/job/Sqoop-hadoop200/1067/])
SQOOP-2986: Add validation check for --hive-import and --incremental (maugli: 
[https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=14754342d3a9bd6e146b9628b2e103ff30f310d8])
* (add) src/test/org/apache/sqoop/tool/ImportToolValidateOptionsTest.java
* (edit) src/java/org/apache/sqoop/tool/BaseSqoopTool.java


> Add validation check for --hive-import and --incremental lastmodified
> ---------------------------------------------------------------------
>
>                 Key: SQOOP-2986
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2986
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Szabolcs Vasas
>            Assignee: Szabolcs Vasas
>             Fix For: 1.4.7
>
>         Attachments: SQOOP-2986.patch, SQOOP-2986.patch
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified

Reply via email to