[
https://issues.apache.org/jira/browse/MAPREDUCE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943218#comment-15943218
]
Steve Loughran commented on MAPREDUCE-6840:
-------------------------------------------
milis is a pretty human-unfriendly number; the mechanism Configuration uses to
support ms, s, m, h, d is better. I think it should be possible to use
{{configuration.getTimeDuration()}} to parse the duration arg simply by
creating a no-default Configuration, set the property, then have it parse the
string. Ugly but effective
> Distcp to support cutoff time
> -----------------------------
>
> Key: MAPREDUCE-6840
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: distcp
> Affects Versions: 2.6.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: MAPREDUCE-6840.1.patch
>
>
> To ensure consistency in the datasets on HDFS, some projects like file
> formats on Hive do HDFS operations in a particular order. For example, if a
> file format uses an index file, a new version of the index file will only be
> written to HDFS after all files mentioned by the index are written to HDFS.
> When we do distcp, it's important to preserve that consistency, so that we
> don't break those file formats.
> A typical solution for that is to create a HDFS Snapshot beforehand, and only
> distcp the Snapshot. That could work well if the user has superuser
> privilege to make the directory snapshottable.
> If not, then it will be beneficial to have a cutoff time for distcp, so that
> distcp only copy files modified on/before that cutoff time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]