Hi Everyone, How can I do a md5sum/sha1sum directly against files on HDFS ?
A pretty common thing I do when archiving files is make an md5sum list eg) md5sum /archive/path/* > md5sum-list.txt Then later should I need to check the files are ok, perhaps before a restore, or when I copy them to somewhere else I'll do md5sum -c md5sum-list.txt I'd be ok doing it 1 file at a time java -jar <something> hdfs://some/path/in-hadoop/filename I'm also ok doing it serially through a single node, I've been doing some googling and JIRA ticket reading such as https://issues.apache.org/jira/browse/HADOOP-3981 and for my use case serial read is not a limitation. What is a bit of a requirement is something I can run as a standard linux command on local disk an do 1:1 output comparison. eg) Check HDFS md5sum on a file, copyToLocal, re-check local disk md5sum. Thanks, Scott