thanks.  I believe the standard distcp traverse the dirs to determine the 
change list in the map tasks (distcp2)? if it does it on the client, it may add 
startup overhead.

anyways, do you think its worth while to explore leveraging HDFS snapshot 
feature in Hadoop 2.x?   


On Wednesday, June 11, 2014 7:01 PM, Srikanth Sundarrajan <[email protected]> 
wrote:
 


We use standard distcp with minimal customizations. The standard distcp does 
copy only files that have changed in sync / update mode. And falcon does use 
this mode for replication.

Regards
Srikanth Sundarrajan


----------------------------------------
> Date: Wed, 11 Jun 2014 17:57:19 -0700
> From: [email protected]
> Subject: Falcon distcp use HDFS snapshots?
> To: [email protected]
>
> Hi Folks,
>
> Does the distcp in Falcon use HDFS snapshot feature (take a snapshot before 
> the copy and diff it with previous snapshot to get the list of files changed 
> on the source)?
>
> HDFS snapshot sounds like a useful feature.
>
> Thanks

Reply via email to