Re: std.hash design

Regan Heath Fri, 22 Jun 2012 10:13:18 -0700

On Fri, 22 Jun 2012 14:21:28 +0100, Johannes Pfau <[email protected]>wrote:

Am Fri, 22 Jun 2012 12:03:27 +0100
schrieb "Regan Heath" <[email protected]>:


It might help (or it might not) to have a glance at the "design" of
the hashing routines in Tango:
http://www.dsource.org/projects/tango/docs/current/
(see tango.util.digest etc)

I contributed some of the initial code for these, though it has
since evolved a lot.  I started with structs, mirroring the phobos
MD5 code but used all sorts of unnecessary mixins to get the code
reuse I wanted.  The result was ugly :p

Later someone contacted me about it, and wanted a class based
approach so I did some refactoring and the result was much cleaner.
I'm not trying to say that a struct approach cannot be clean, just
that I did a bad job of it initially, and also structs don't lend
themselves to the factory pattern though which is a nice way to use
hashing.


I had a short look at Piotr Szturmaj's sha implementations, and it
seems this kind of code would benefit a lot from inheritance. I
understand that it was probably impossible to do this in D1, but don't
you think 'alias this' could work in D2? This wouldn't solve the
problem with the factory pattern, but that can be solved by providing
wrapper classes.

My original code was D1 and I used structs and mixins.. so perhaps aliasthis will solve the code re-use problem. I haven't done enough D2 to behelpful here I'm afraid.

> toString doesn't make sense on a hash, as finish() has to be called
> before a string can be generated. So a helper function could be
> useful.

toString() could output the intermediate/internal state at the time
of the call, which if called after "finish" would be the hash
result.  I can't recall if this has any specific usefulness, tho I
have a nagging/niggling itch which says I did use this intermediate
result for something at some stage.

It might be useful to have toString on a hash so that we can pass a
completed hash object around and repeatedly obtain the string
representation vs obtaining it once on "finish" and passing the
string around.  However, that said, it's probably more secure to
destroy and scrub the memory used by the hash object ASAP and only
retain the resulting string or ubyte[] result.

I think I've talked myself round in a circle.. I think if we have a
way to obtain the current state as ubyte[] that would satisfy the
niggle I have. Having a separate routine for turning a ubyte[] into a
hex string is probably better than attaching toString to a hash
object.


We could also provide a finishString function or something like that.
But toString returning a intermediate state would be confusing.

Agreed. In fact I wouldn't bother with finishString either TBH, peoplecan always pass the result of finish string into the method which producesthe hex string representation.

IIRC when I wrote my Tiger implementation it was fairly new, and I had adifferent method for formatting the hex string representation. Eitherthey later changed the Tiger spec, or I was confused at the time because Ihave this niggling memory that I later "discovered" it was the same allalong, or something.

In any case, we can probably have one static toHexString method for alldigests.

Tango doesn't seem to offer a way to peek at the current state. But if
it's really useful, it could be added.


Probably just cobwebs in my memory, ignore me :p

BTW: Do you know why digestSize is a function in tango? Are there
digests that produce variable length hashes?

Not to my knowledge.. perhaps there is a time/place where you want to knowthe size of the digest result before calculating the digest? Might beuseful in generic code perhaps..


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Re: std.hash design

Reply via email to