> On Nov. 4, 2015, 4:07 p.m., Jarek Cecho wrote: > > test/src/main/java/org/apache/sqoop/test/testcases/ConnectorTestCase.java, > > lines 285-286 > > <https://reviews.apache.org/r/39927/diff/1/?file=1115324#file1115324line285> > > > > I don't like the idea that we're delaying every single test case for > > every single repository implementation by 3 seconds only because of one > > repository implementation. > > > > What about fixing the root of the problem rathern just inserting sleep? > > Would it be fair to assume that the MySQL repository is not fully > > initialized at the time we're submitting the job? If so, can't we simply > > not run the test case until the Sqoop 2 server fully started and all is > > initialized? > > Colin Ma wrote: > Based on current investigation, if ConnectorTestCase.executeJob called > continuesly, there will a problem to select the last submission for a job. > Both of the submissions will be updated time by time, and it's hard to pick > the last submission according to the update time. > Currently, I put the sleep in the specific test case and this won't > impact other cases. > > Jarek Cecho wrote: > What exact issues have you seen Colin? Do you have exceptions? > > I would like to understand what's happening on the repository side > because I'm concerned that we have bugs there. Users might be calling the > submission job in a loop as well and we can't insert a sleep to them so we > should make sure that our repository is resilient enough here. > > Colin Ma wrote: > When do the ConnectorTestCase.executeJob() in integration test, it will > call SqoopClient.startJob(), the following are related code in > SqoopClient.startJob: > ```java > MSubmission submission = > resourceRequests.startJob(jobName).getSubmissions().get(0); //submission1 > while(submission.getStatus().isRunning()) { > ........... > submission = getJobStatus(jobName); // submission2 > } > ``` > > The problem is when 2nd call ConnectorTestCase.executeJob(), **the > submission1 is different from submission2, this makes the test failed.** > > The following is the related db info in table SQ_SUBMISSION: > > | SQS_ID | SQS_JOB | SQS_STATUS | SQS_CREATION_DATE | SQS_CREATION_USER > | SQS_UPDATE_DATE | > | 3 | 12 | SUCCEEDED | 2015-11-06 14:27:35 | colin > | 2015-11-06 14:27:36 |(created by the 1st call) > | 4 | 12 | BOOTING | 2015-11-06 14:27:36 | colin > | 2015-11-06 14:27:36 |(created by the 2nd call) > > For the 2nd call ConnectorTestCase.executeJob(), the submission1 and > submission2 should be both **SQS_ID=4**, but submission2 is > **SQS_ID=3(created by the 1st call)**. submission2 is from the sql > CommonRepositoryInsertUpdateDeleteSelectQuery.STMT_SELECT_SUBMISSIONS_FOR_JOB, > and **SQS_UPDATE_DATE** play a key role to get the last submission. From db > info, the SQS_UPDATE_DATE are the same. > What I think is SQS_STATUS of **SQS_ID=3** will be updated when finished, > and the **SQS_ID=4** is created in a very short interval. This cause the > SQS_UPDATE_DATE are the same, and getJobStatus() doesn't return the excepted > submission. > > This problem doesn't happen in derby or postgresql, I don't know why they > can pick the right submission with the same SQS_UPDATE_DATE.... > > Jarek Cecho wrote: > Excellent work on investigating the issue Colin. I think that you've > uncovered a real bug in the MySQL repository here. > > I've looked into it as well and I think that I understand the root > problem. The field SQS_UPDATE_DATE is defined as TIMESTAMP in all three > repository implementations. Whereas both Derby [1] and PostgreSQL [2] are > storing timestamp including the fractional portion, MySQL up to version 5.6.4 > does not and stores only second precision [3]. > > Hence the query > CommonRepositoryInsertUpdateDeleteSelectQuery.STMT_SELECT_SUBMISSIONS_FOR_JOB > will always select the last submission in Debry and PostgreSQL case even if > two submissions happened in the same second because the fractional portion > will be different whereas in MySQL the two events indeed happened at the same > second hence the query is ambiguous. > > Thinking about possible solutions, would it make sense to order the query > STMT_SELECT_SUBMISSIONS_FOR_JOB by both SQS_UPDATE_DATE and SQS_ID to get > the latest entry or would that impose additional problems? > > Links: > 1: https://db.apache.org/derby/docs/10.4/ref/rrefsqlj27620.html > 2: http://www.postgresql.org/docs/current/static/datatype-datetime.html > 3: http://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html
Thanks for your help, add the SQS_ID in the "order by" clause, and the problem is gone. All integration tests for MySql repo are passed now. - Colin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39927/#review105075 ----------------------------------------------------------- On Nov. 9, 2015, 1:51 a.m., Colin Ma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/39927/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2015, 1:51 a.m.) > > > Review request for Sqoop. > > > Repository: sqoop-sqoop2 > > > Description > ------- > > There are 2 problems should be fixed with MySql repository: > 1. Can't detect the repository version correctly. > 2. There should suspend several seconds when execute the job in the > integration test. > > > Diffs > ----- > > > common-test/src/main/java/org/apache/sqoop/common/test/db/MySQLProvider.java > 268e475 > > common-test/src/main/java/org/apache/sqoop/common/test/repository/MysqlRepositoryProvider.java > 229b339 > > repository/repository-common/src/main/java/org/apache/sqoop/repository/common/CommonRepositoryInsertUpdateDeleteSelectQuery.java > 3a4e80a > > repository/repository-mysql/src/main/java/org/apache/sqoop/repository/mysql/MySqlRepositoryHandler.java > fd3a3f2 > > Diff: https://reviews.apache.org/r/39927/diff/ > > > Testing > ------- > > > Thanks, > > Colin Ma > >
