I think in 2.5 this is already the case. There were a bunch of improvements
made in 2.4 and 2.5 in DistCp which falcon does not take advantage of. We
plan to move to use core distcp in the near future.


On Wed, Jun 11, 2014 at 7:15 PM, Venkat R <[email protected]>
wrote:

> thanks.  I believe the standard distcp traverse the dirs to determine the
> change list in the map tasks (distcp2)? if it does it on the client, it may
> add startup overhead.
>
> anyways, do you think its worth while to explore leveraging HDFS snapshot
> feature in Hadoop 2.x?
>
>
> On Wednesday, June 11, 2014 7:01 PM, Srikanth Sundarrajan <
> [email protected]> wrote:
>
>
>
> We use standard distcp with minimal customizations. The standard distcp
> does copy only files that have changed in sync / update mode. And falcon
> does use this mode for replication.
>
> Regards
> Srikanth Sundarrajan
>
>
> ----------------------------------------
> > Date: Wed, 11 Jun 2014 17:57:19 -0700
> > From: [email protected]
> > Subject: Falcon distcp use HDFS snapshots?
> > To: [email protected]
> >
> > Hi Folks,
> >
> > Does the distcp in Falcon use HDFS snapshot feature (take a snapshot
> before the copy and diff it with previous snapshot to get the list of files
> changed on the source)?
> >
> > HDFS snapshot sounds like a useful feature.
> >
> > Thanks
>



-- 
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry

Reply via email to