[
https://issues.apache.org/jira/browse/SQOOP-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884731#comment-13884731
]
Venkat Ranganathan commented on SQOOP-1273:
-------------------------------------------
I thought I had submitted a patch for this - may be I missed. Sorry about that.
The fix I did was to add the current process id.
> Multiple append jobs can easily end up sharing directories
> ----------------------------------------------------------
>
> Key: SQOOP-1273
> URL: https://issues.apache.org/jira/browse/SQOOP-1273
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.4
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Fix For: 1.4.5
>
>
> I've noticed at multiple user deployments that when running Sqoop in append
> mode ({{--append}}) it can happen that two separate jobs will end up using
> the same temporary directory. This is a disaster as those jobs will then
> start interfering with each other and possibly even cause a data loss.
> Currently we are using following code to generate temporary directory
> ([AppendUtils.java|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/util/AppendUtils.java#L269]):
> {code}
> public static Path getTempAppendDir(String tableName) {
> String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
> String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
> return new Path(tempDir);
> }
> {code}
> There are three different parts that we are currently using to generate the
> temporary directory:
> * {{TEMP_IMPORT_ROOT}}: Constant. It can be changed by the user if needed,
> but as we do not have this documented, most users are using the default
> constant value.
> * {{timeId}} - Current time with millisecond precision.
> * {{tableName}} - Name of the transferred table or {{null}} for query
> ({{--query}}) based import.
> The problem mainly surfaces in the {{--query}} based import when 2 out of the
> 3 parts are constants and it can happen that two Sqoop jobs might get started
> at the same time.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)