if the ip# are really very different you can do in 2 steps: - external sort you log file and - rank each ip sequencially on the sorted file.
On Thu, Sep 4, 2008 at 2:12 AM, Huabin Zheng <[EMAIL PROTECTED]> wrote: > Hi all, > I am encountered with a problem, it looks like this: > > There is a log file which records all the IPs that visited a certain > web site. The log file may be several G bytes, but the computer used > to analyze it has limited memory, about 1G bytes. I am asked to figure out > the Top K IPs which visited the web site most most frequently. > is hash table competent to solve it? > > Any other suggestions? Or are there classic algorithms existed to cope with > it? > > thanks > > Regards, > Huabin > > -- > Huabin Zheng > Sensor Networks and Application Research Center, GUCAS > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
