I think this one has a solution (or something close to it) from the Data mining methods called "Frequent Set mining".
This one paper (chapter of a book, actually) explains the recent algorithms for that http://www.adrem.ua.ac.be/bibrem/pubs/fimchap.pdf I think for your case, there should be some divide-and-conquer algorithm ready for that. -Kurniady On Thu, Sep 4, 2008 at 4:12 PM, Huabin Zheng <[EMAIL PROTECTED]> wrote: > Hi all, > I am encountered with a problem, it looks like this: > There is a log file which records all the IPs that visited a certain web > site. The log file may be several G bytes, but the computer used to analyze > it has limited memory, about 1G bytes. I am asked to figure out the Top K > IPs which visited the web site most most frequently. > is hash table competent to solve it? > Any other suggestions? Or are there classic algorithms existed to cope with > it? > thanks > Regards, > Huabin > -- > Huabin Zheng > Sensor Networks and Application Research Center, GUCAS > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
