just dont forget that a RadixTree is O(L) on the length of the strings
upon lookup, while a Set is O(1) on average (worse the more collisions
you have) since a string's hashCode is stored as an instance field.
But since they're lazily calculated, for brand new strings, lookup
time on a Set is O(N) on the size of the string youre looking up.

On Fri, Sep 4, 2009 at 8:59 AM, andreasp7n<andr...@petersson.at> wrote:
>
> On 3 Sep., 17:14, Barney <barney.h...@gmail.com> wrote:
>> Is it realistic to use HashSet to determine if a large amount of
>> string data (2 000 000 strings of length 20) is composed of unique
>> entry ?
>
> i needed something like this recently, i used a radix tree data
> structure to store all strings. quite space-saving. stored 3M customer
> names, adresses in memory. was no problem memory-wise. there is a
> practical implementation over at http://code.google.com/p/radixtree/
>
> while building up the radix tree you can check if you have any
> duplication easily.
> >
>



-- 
http://mapsdev.blogspot.com/
Marcelo Takeshi Fukushima

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to