Re: How to get the top K records from a huge dataset?

Andrian Kurniady Wed, 10 Sep 2008 14:44:14 -0700

I think this one has a solution (or something close to it) from the
Data mining methods called "Frequent Set mining".


This one paper (chapter of a book, actually) explains the recent
algorithms for that
http://www.adrem.ua.ac.be/bibrem/pubs/fimchap.pdf

I think for your case, there should be some divide-and-conquer
algorithm ready for that.

-Kurniady

On Thu, Sep 4, 2008 at 4:12 PM, Huabin Zheng <[EMAIL PROTECTED]> wrote:
> Hi all,
>     I am encountered with a problem, it looks like this:
>     There is a log file which records all the IPs that visited a certain web
> site. The log file may be several G bytes, but the computer used to analyze
> it has limited memory, about 1G bytes. I am asked to figure out the Top K
>  IPs which visited the web site most most frequently.
> is hash table competent to solve it?
> Any other suggestions? Or are there classic algorithms existed to cope with
> it?
> thanks
> Regards,
> Huabin
> --
> Huabin Zheng
> Sensor Networks and Application Research Center, GUCAS
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"google-codejam" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-code?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: How to get the top K records from a huge dataset?

Reply via email to