It would help to know your data ingest and processing patterns (and any applicable SLAs).
In most cases, you'd only need to move the raw ingested data, then you can derive the rest in the other cluster. Assuming that you have some sort of date-based partitioning on the ingest, then it's easy to define a cut-off point. Depending on your read SLAs, you could tee writes to both clusters for a period of time, or just simply switch off to the new one once the majority of data has been moved. Finally, you would want to do a consistency check to make sure everything made it to the other side... maybe run a checksum on derived data on both clusters and compare. Something like that... - P On Fri, Aug 3, 2012 at 5:19 PM, Patai Sangbutsarakum < silvianhad...@gmail.com> wrote: > thanks for response. > Physical move is not a choice in this case. Purely looking for copying > data and how to catch up with the update of a file while it is being > migrated. > > On Fri, Aug 3, 2012 at 12:40 PM, Chen He <airb...@gmail.com> wrote: > > sometimes, physically moving hard drives helps. :) > > On Aug 3, 2012 1:50 PM, "Patai Sangbutsarakum" <silvianhad...@gmail.com> > > wrote: > > > >> Hi Hadoopers, > >> > >> We have a plan to migrate Hadoop cluster to a different datacenter > >> where we can triple the size of the cluster. > >> Currently, our 0.20.2 cluster have around 1PB of data. We use only > >> Java/Pig. > >> > >> I would like to get some input how we gonna handle with transferring > >> 1PB of data to a new site, and also keep up with > >> new files that thrown into cluster all the time. > >> > >> Happy friday !! > >> > >> P > >> >