[jira] [Updated] (HIVE-15093) For S3-to-S3 renames, files should be moved individually rather than at a directory level

Sahil Takiar (JIRA) Tue, 01 Nov 2016 22:59:58 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sahil Takiar updated HIVE-15093:
--------------------------------
    Attachment: HIVE-15093.6.patch

> For S3-to-S3 renames, files should be moved individually rather than at a 
> directory level
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-15093
>                 URL: https://issues.apache.org/jira/browse/HIVE-15093
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>    Affects Versions: 2.1.0
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, HIVE-15093.6.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15093) For S3-to-S3 renames, files should be moved individually rather than at a directory level

Reply via email to