I don't think there is a built in command. I would just use the java or thrift 
api to read the file & calculate the hash. (thrift + python/ruby/etc)

Take care,
 -stu
-----Original Message-----
From: Scott Golby <sgo...@conductor.com>
Date: Wed, 2 Mar 2011 11:05:04 
To: hdfs-user@hadoop.apache.org<hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: md5sum of files on HDFS ?

Hi Everyone,

How can I do a md5sum/sha1sum directly against files on HDFS ?


A pretty common thing I do when archiving files is make an md5sum list

eg)  md5sum /archive/path/* > md5sum-list.txt

Then later should I need to check the files are ok, perhaps before a restore, 
or when I copy them to somewhere else I'll do
md5sum -c md5sum-list.txt


I'd be ok doing it 1 file at a time

java -jar <something> hdfs://some/path/in-hadoop/filename


I'm also ok doing it serially through a single node, I've been doing some 
googling and JIRA ticket reading such as 
https://issues.apache.org/jira/browse/HADOOP-3981 and for my use case serial 
read is not a limitation.

What is a bit of a requirement is something I can run as a standard linux 
command on local disk an do 1:1 output comparison.  
eg) Check HDFS md5sum on a file, copyToLocal, re-check local disk md5sum.

Thanks,
Scott






Reply via email to