[ 
https://issues.apache.org/jira/browse/SQOOP-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szabolcs Vasas updated SQOOP-2986:
----------------------------------
    Description: 
Sqoop import with --hive-import and --incremental lastmodified options is not 
supported, however the application is able to run with these parameters but it 
produces unexpected results, the output can contain duplicate rows.

Steps to reproduce the issue:

1) Create the necessary table for example in MySQL:

CREATE TABLE "Employees" (
  "id" int(11) NOT NULL,
  "name" varchar(45) DEFAULT NULL,
  "salary" varchar(45) DEFAULT NULL,
  "change_date" datetime DEFAULT NULL,
  PRIMARY KEY ("id")
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(1,'employee1',1000,now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(2,'employee2','2000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(3,'employee3','3000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(4,'employee4','4000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(5,'employee5','5000',now());

2) Import the table to Hive

sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
--username sqoop --password sqoop --table Employees --num-mappers 1 
--hive-import --hive-table Employees 

3) Update some rows in MySQL:

UPDATE Employees SET salary=1010, change_date=now() where id=1;
UPDATE Employees SET salary=2010, change_date=now() where id=2;

4) Execute the incremental import command:

sudo -u hdfs sqoop import --verbose --connect 
jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
Employees --incremental lastmodified --check-column change_date --merge-key id 
--num-mappers 1 --hive-import --hive-table Employees --last-value 
"last_timestamp"

5) As a result employees with ids 1 and 2 will not be updated but we will see 
duplicate rows in the Hive table.


The task is to introduce a fail-fast validation which will make the Sqoop 
import fail if it was submitted with --hive-import and --incremental 
lastmodified options.

  was:
Sqoop import with --hive-import and --incremental lastmodified options is not 
supported, however the application is able to run with these parameters but it 
produces unexpected results, the output can contain duplicate rows.

Steps to reproduce the issue:

1) Create the necessary table for example in MySQL:

CREATE TABLE "employees" (
  "id" int(11) NOT NULL,
  "name" varchar(45) DEFAULT NULL,
  "salary" varchar(45) DEFAULT NULL,
  "change_date" datetime DEFAULT NULL,
  PRIMARY KEY ("id")
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(1,'employee1',1000,now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(2,'employee2','2000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(3,'employee3','3000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(4,'employee4','4000',now()); 
INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
(5,'employee5','5000',now());

2) Import the table to Hive

sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
--username sqoop --password sqoop --table Employees --num-mappers 1 
--hive-import --hive-table Employees 

3) Update some rows in MySQL:

UPDATE Employees SET salary=1010, change_date=now() where id=1;
UPDATE Employees SET salary=2010, change_date=now() where id=2;

4) Execute the incremental import command:

sudo -u hdfs sqoop import --verbose --connect 
jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
Employees --incremental lastmodified --check-column change_date --merge-key id 
--num-mappers 1 --hive-import --hive-table Employees --last-value 
"last_timestamp"

5) As a result employees with ids 1 and 2 will not be updated but we will see 
duplicate rows in the Hive table.


The task is to introduce a fail-fast validation which will make the Sqoop 
import fail if it was submitted with --hive-import and --incremental 
lastmodified options.


> Add validation check for --hive-import and --incremental lastmodified
> ---------------------------------------------------------------------
>
>                 Key: SQOOP-2986
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2986
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Szabolcs Vasas
>            Assignee: Szabolcs Vasas
>             Fix For: 1.4.7
>
>
> Sqoop import with --hive-import and --incremental lastmodified options is not 
> supported, however the application is able to run with these parameters but 
> it produces unexpected results, the output can contain duplicate rows.
> Steps to reproduce the issue:
> 1) Create the necessary table for example in MySQL:
> CREATE TABLE "Employees" (
>   "id" int(11) NOT NULL,
>   "name" varchar(45) DEFAULT NULL,
>   "salary" varchar(45) DEFAULT NULL,
>   "change_date" datetime DEFAULT NULL,
>   PRIMARY KEY ("id")
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (1,'employee1',1000,now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (2,'employee2','2000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (3,'employee3','3000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (4,'employee4','4000',now()); 
> INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES 
> (5,'employee5','5000',now());
> 2) Import the table to Hive
> sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop 
> --username sqoop --password sqoop --table Employees --num-mappers 1 
> --hive-import --hive-table Employees 
> 3) Update some rows in MySQL:
> UPDATE Employees SET salary=1010, change_date=now() where id=1;
> UPDATE Employees SET salary=2010, change_date=now() where id=2;
> 4) Execute the incremental import command:
> sudo -u hdfs sqoop import --verbose --connect 
> jdbc:mysql://servername:3306/sqoop --username sqoop --password sqoop --table 
> Employees --incremental lastmodified --check-column change_date --merge-key 
> id --num-mappers 1 --hive-import --hive-table Employees --last-value 
> "last_timestamp"
> 5) As a result employees with ids 1 and 2 will not be updated but we will see 
> duplicate rows in the Hive table.
> The task is to introduce a fail-fast validation which will make the Sqoop 
> import fail if it was submitted with --hive-import and --incremental 
> lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to