I think u can go ahead with Hash table concept.Analyse the number(n)of ips u can accomodate in 1GB memory, then make hash function so that it can acomodate n entries in hash buckets, if collison occurs , first match whether the ip u picked is same as value on the hash bucket,if same reject the ip and pick next ip from the 1G> memory comp, if different just put it on next free bucket. You can follow this for all picked ips.Keep the track when ur bucket gets filled.Stop at that point, u will have n uniquie latest ips.
On Sep 4, 2:12 pm, "Huabin Zheng" <[EMAIL PROTECTED]> wrote: > Hi all, > I am encountered with a problem, it looks like this: > > There is a log file which records all the IPs that visited a certain web > site. The log file may be several G bytes, but the computer used to analyze > it has limited memory, about 1G bytes. I am asked to figure out the Top K > IPs which visited the web site most most frequently. > is hash table competent to solve it? > > Any other suggestions? Or are there classic algorithms existed to cope with > it? > > thanks > > Regards, > Huabin > > -- > Huabin Zheng > Sensor Networks and Application Research Center, GUCAS --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
