Neil Mitchell wrote:
Hi

The MD5SUM.EXE file I have chokes if you ask it to hash a file in
another directory. It will hash from stdin, or from a file in the
current directory, but point-blank refuses to hash anything else.

Try http://www.cs.york.ac.uk/fp/yhc/dependencies/UnxUtils.zip - that
has an MD5SUM program in it that seems to work fine on things in
different directories. It also has many other great utilities in it.

Negative. It gives strange output if the pathname contains any backslashes. (Each backslash appears twice, and an additional backslash appears just before the hash value. Very odd...)

I spent a while playing with Google, and found many, many implementations of MD5. Every single one of them did *something* strange under certain conditions. Most frustrating! Well anyway, I eventually settled on a program MD5DEEP.EXE, which seems to work just about well enough to be useful.

I'm trying to imagine what mistake the authors of your version of
MD5SUM must have made to screw up files in different directories, but
it eludes me...

It seems typically Unix tools are compiled for Windows with the aid of a Unix emulator. These often do all sorts of strange path munging to make Windows look like Unix. That's probably the source of the problem...



BTW, while I'm here... I sat down and wrote my own MD5 implementation. It's now 95% working. (The padding algorithm goes wrong for certain message lengths.) I doubt it'll ever be fast, but I wanted to see how hard it would be to implement. The hard part, ridiculously enough, wasn't MD5 itself. It's all the datatype conversions. Nowhere in the Haskell libraries can I find any of these functions:

 pack8into16 :: [Word8] -> Word16
 pack8into32 :: [Word8] -> Word32
 unpack16into8 :: Word16 -> [Word8]
 unpack32into8 :: Word32 -> [Word8]
 pack8into16s :: [Word8] -> [Word16]
 pack8into32s :: [Word8] -> [Word32]
 etc.

I had to write all these myself, by hand, and then check that I got everything the right way round and so forth. (And every now and then I find an edge case where these functions go wrong.) Of course, on top of that, MD5 uses something really stupid called "little endian integers". In other words, to interpret the data, you have to read it partially backwards, partially forwards. Really awkward to get right!

But, after a few hours last night and a few more this morning, I was able to get the main program to work properly. If I can just straighten out the message padding code, I'll be all set... Then I can see about measuring just how slow it is. :-}

Most amusing moment: Trying to run the GHC debugger, and then realising that you have to actually install the new version of GHC first...

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to