Jarek Jarcec Cecho created SQOOP-1273:
-----------------------------------------

             Summary: Multiple append jobs can easily end up sharing directories
                 Key: SQOOP-1273
                 URL: https://issues.apache.org/jira/browse/SQOOP-1273
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.4
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho
             Fix For: 1.4.5


I've noticed at multiple user deployments that when running Sqoop in append 
mode ({{--append}}) it can happen that two separate jobs will end up using the 
same temporary directory.  This is a disaster as those jobs will then start 
interfering with each other and possibly even cause a data loss. Currently we 
are using following code to generate temporary directory 
([AppendUtils.java|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/util/AppendUtils.java#L269]):

{code}
  public static Path getTempAppendDir(String tableName) {
    String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
    String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
    return new Path(tempDir);
  }
{code}

There are three different parts that we are currently using to generate the 
temporary directory:

* {{TEMP_IMPORT_ROOT}}: Constant. It can be changed by the user if needed, but 
as we do not have this documented, most users are using the default constant 
value.
* {{timeId}} - Current time with millisecond precision.
* {{tableName}} - Name of the transferred table or {{null}} for query 
({{--query}}) based import.

The problem mainly surfaces in the {{--query}} based import when 2 out of the 3 
parts are constants and it can happen that two Sqoop jobs might get started at 
the same time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to