ya its possible .first ur log file is split in size proportional to ur mem size.ones its split. u need to count the no of times each ip adress is visited.similar to external sort principle .after that u retrieve the frequently visited ips and store it in separate logfile.this way i get the no of times each ip adress visited from each slot u have visietd from that u tk the most vistted and sort again for finding frequently visited ip addresses.
On Thu, Sep 4, 2008 at 2:42 PM, Huabin Zheng <[EMAIL PROTECTED]> wrote: > Hi all, > I am encountered with a problem, it looks like this: > > There is a log file which records all the IPs that visited a certain > web site. The log file may be several G bytes, but the computer used > to analyze it has limited memory, about 1G bytes. I am asked to figure out > the Top K IPs which visited the web site most most frequently. > is hash table competent to solve it? > > Any other suggestions? Or are there classic algorithms existed to cope with > it? > > thanks > > Regards, > Huabin > > -- > Huabin Zheng > Sensor Networks and Application Research Center, GUCAS > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
