I don't think there is a built in command. I would just use the java or thrift api to read the file & calculate the hash. (thrift + python/ruby/etc)
Take care, -stu -----Original Message----- From: Scott Golby <sgo...@conductor.com> Date: Wed, 2 Mar 2011 11:05:04 To: hdfs-user@hadoop.apache.org<hdfs-user@hadoop.apache.org> Reply-To: hdfs-user@hadoop.apache.org Subject: md5sum of files on HDFS ? Hi Everyone, How can I do a md5sum/sha1sum directly against files on HDFS ? A pretty common thing I do when archiving files is make an md5sum list eg) md5sum /archive/path/* > md5sum-list.txt Then later should I need to check the files are ok, perhaps before a restore, or when I copy them to somewhere else I'll do md5sum -c md5sum-list.txt I'd be ok doing it 1 file at a time java -jar <something> hdfs://some/path/in-hadoop/filename I'm also ok doing it serially through a single node, I've been doing some googling and JIRA ticket reading such as https://issues.apache.org/jira/browse/HADOOP-3981 and for my use case serial read is not a limitation. What is a bit of a requirement is something I can run as a standard linux command on local disk an do 1:1 output comparison. eg) Check HDFS md5sum on a file, copyToLocal, re-check local disk md5sum. Thanks, Scott