Re: How to get the top K records from a huge dataset?

ila Wed, 10 Sep 2008 14:44:02 -0700

I think u can go ahead with Hash table concept.Analyse the number(n)of
ips u can accomodate in 1GB memory, then make  hash function so that
it can acomodate n entries in hash buckets, if collison occurs , first
match whether the ip u picked is same as value on the hash bucket,if
same reject the ip and pick next ip from the 1G> memory comp, if
different just put it on next free bucket. You can follow this for all
picked ips.Keep the track when ur bucket gets filled.Stop at that
point, u will have n uniquie latest ips.


On Sep 4, 2:12 pm, "Huabin Zheng" <[EMAIL PROTECTED]> wrote:
> Hi all,
>     I am encountered with a problem, it looks like this:
>
>     There is a log file which records all the IPs that visited a certain web
> site. The log file may be several G bytes, but the computer used to analyze
> it has limited memory, about 1G bytes. I am asked to figure out the Top K
>  IPs which visited the web site most most frequently.
> is hash table competent to solve it?
>
> Any other suggestions? Or are there classic algorithms existed to cope with
> it?
>
> thanks
>
> Regards,
> Huabin
>
> --
> Huabin Zheng
> Sensor Networks and Application Research Center, GUCAS

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"google-codejam" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-code?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: How to get the top K records from a huge dataset?

Reply via email to