I am embarking on an archiving project, and just wondered if anyone had any decent scripts/etc for syncing a lot of data between two HDFS instances. I have my production hadoop cluster in VA, where we store a lot of data, and we are bringing up our archive cluster here in CA, where we will keep data >90d (or however old we decide). Just wondered if anyone had a good pre-existing solution, or if I'll be writing one.
Thanks! -j