[google-appengine] Re: Paging through large datasets - Article discussion

Nick Johnson (Google) Tue, 27 Oct 2009 02:54:46 -0700

Hi Martin,

On Tue, Oct 27, 2009 at 9:37 AM, Martin Trummer <[email protected]
> wrote:


>
> well, I don't have the mathematical skills to prove you wrong:
> but according to several articles I've read, MD5 hashes are not
> collision resistant,
> and there are several ways to crack an MD5 hash (that are better than
> brute force)
>

Collision attacks on MD5 have been found, yes. But a collision attack
requires the attacker to specify both strings, and currently at least,
requires them to be at least 128 bytes long, and makes no guarantee about
human readability. A preimage attack, which would find a plaintext that
hashes to the same value as a given hash, has not been found - and in any
case, the plaintext would not be the same as the input one.


> so:
>  * it's not purely theoretical
>  * the e-mail address could be restored from the hash
>

It's not possible now - and is never likely to be possible - to calculate a
preimage for MD5 and get back the original email address, using any method
short of brute force testing of every possible valid email address.


>
> when you say: "sufficiently random",
> then why don't we use a simple random number instead of an hash value
> of the (valuable) user-mail address?
> the random number would also be "sufficiently random" :)


> Why don't we simply use the unique user-id instead of the user's mail
> address for the sharded counter?
> That would be a perfect fit - it's unique and meaningless
>

You are welcome to use either of these techniques.

-Nick Johnson


>
> Anyway - I don't want to harp on about that.
> The artice is great, but I think there should at least be a footnote
> that
> there might be better ways instead of using MD5-hashes of users mail-
> addresses
>
> On Oct 27, 12:44 am, "Nick Johnson (Google)" <[email protected]>
> wrote:
> > Hi Martin,
> >
> > MD5 hashes are sufficiently random that collisions are purely theoretical
> > and not of practical concern. Many systems, for example, address files by
> > MD5 or SHA1 hash.
> >
> > If you can provide an MD5 or SHA1 collision between two short,
> > human-readable strings, however, I will be happy to amend the article
> with
> > this caveat.
> >
> > Regards,
> >
> > Nick Johnson
> >
> > On Sun, Oct 25, 2009 at 5:40 PM, Martin Trummer <
> [email protected]
> >
> >
> >
> > > wrote:
> >
> > > in this article
> > >http://code.google.com/intl/de-DE/appengine/articles/paging.html
> > > the author points out the problems that arise when you use a field
> > > that may not be unique for paging.
> > > the solution is to use a sharded counter over the user to make the
> > > field unique.
> >
> > > Very fine until here.
> > > But then he suggests to use a MD5-hash-value of the unique value
> > > instead of the real unique value.
> >
> > > This is obviously wrong:
> > > A hash function, will by definition NOT retain the uniqueness of the
> > > source value!
> >
> > > Sure, the chances that 2 unique values result in the same hash value
> > > is (and should by definition be) very low:
> > > but we are not satisfied with a "solution" that works most of the
> > > time, are we?
> >
> > --
> > Nick Johnson, Developer Programs Engineer, App Engine
> > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> Number:
> > 368047
> >
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Paging through large datasets - Article discussion

Reply via email to