[jira] [Commented] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

Hive QA (JIRA) Mon, 07 Nov 2016 21:20:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646545#comment-15646545
 ]


Hive QA commented on HIVE-15093:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837895/HIVE-15093.9.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=207)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2017/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2017/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2017/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837895 - PreCommit-HIVE-Build

> S3-to-S3 Renames: Files should be moved individually rather than at a 
> directory level
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-15093
>                 URL: https://issues.apache.org/jira/browse/HIVE-15093
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>    Affects Versions: 2.1.0
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, 
> HIVE-15093.6.patch, HIVE-15093.7.patch, HIVE-15093.8.patch, HIVE-15093.9.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

Reply via email to