I guess you could do it in several passes, one for each part of the IP. This works really well for finding the most frequent IP -- for top k you'll have to do some more bookkeeping (but you should be able to stay within memory limitations).
In fact to make the bookkeeping easier, it may be better to look at fewer bits of the IP per pass...but I'll let someone smarter than me comment on this first before elaborating. On Sep 4, 3:12 am, "Huabin Zheng" <[EMAIL PROTECTED]> wrote: > Hi all, > I am encountered with a problem, it looks like this: > > There is a log file which records all the IPs that visited a certain web > site. The log file may be several G bytes, but the computer used to analyze > it has limited memory, about 1G bytes. I am asked to figure out the Top K > IPs which visited the web site most most frequently. > is hash table competent to solve it? > > Any other suggestions? Or are there classic algorithms existed to cope with > it? > > thanks > > Regards, > Huabin > > -- > Huabin Zheng > Sensor Networks and Application Research Center, GUCAS --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
