Re: How to get the top K records from a huge dataset?

Huabin Zheng Wed, 17 Sep 2008 23:21:04 -0700

hi all
    thanks for all your suggestions. I have found the classic solution to
topK problems.
    in brief, these problems can be solved by hash table + heap
On Fri, Sep 5, 2008 at 1:13 AM, Andrian Kurniady <[EMAIL PROTECTED]> wrote:


>
> I think this one has a solution (or something close to it) from the
> Data mining methods called "Frequent Set mining".
>
> This one paper (chapter of a book, actually) explains the recent
> algorithms for that
> http://www.adrem.ua.ac.be/bibrem/pubs/fimchap.pdf
>
> I think for your case, there should be some divide-and-conquer
> algorithm ready for that.
>
> -Kurniady
>
> On Thu, Sep 4, 2008 at 4:12 PM, Huabin Zheng <[EMAIL PROTECTED]>
> wrote:
> > Hi all,
> >     I am encountered with a problem, it looks like this:
> >     There is a log file which records all the IPs that visited a certain
> web
> > site. The log file may be several G bytes, but the computer used to
> analyze
> > it has limited memory, about 1G bytes. I am asked to figure out the Top K
> >  IPs which visited the web site most most frequently.
> > is hash table competent to solve it?
> > Any other suggestions? Or are there classic algorithms existed to cope
> with
> > it?
> > thanks
> > Regards,
> > Huabin
> > --
> > Huabin Zheng
> > Sensor Networks and Application Research Center, GUCAS
> >
> > >
> >
>
> >
>


-- 
Huabin Zheng
Sensor Networks and Application Research Center, GUCAS

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"google-codejam" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-code?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: How to get the top K records from a huge dataset?

Reply via email to