Hi Everyone,

How can I do a md5sum/sha1sum directly against files on HDFS ?


A pretty common thing I do when archiving files is make an md5sum list

eg)  md5sum /archive/path/* > md5sum-list.txt

Then later should I need to check the files are ok, perhaps before a restore, 
or when I copy them to somewhere else I'll do
md5sum -c md5sum-list.txt


I'd be ok doing it 1 file at a time

java -jar <something> hdfs://some/path/in-hadoop/filename


I'm also ok doing it serially through a single node, I've been doing some 
googling and JIRA ticket reading such as 
https://issues.apache.org/jira/browse/HADOOP-3981 and for my use case serial 
read is not a limitation.

What is a bit of a requirement is something I can run as a standard linux 
command on local disk an do 1:1 output comparison.  
eg) Check HDFS md5sum on a file, copyToLocal, re-check local disk md5sum.

Thanks,
Scott






Reply via email to