We will benchmark the SpaceSaving algorithm first and update our findings.

On Mon, Aug 17, 2015 at 5:43 PM, Li Yang <[email protected]> wrote:

> TopN is a common analysis requirement. Kylin does support 'order by' and
> 'limit' currently, but it is online calculation. In order to greatly speed
> up TopN query, we can pre-calculate TopN and store it like a metrics.
>
> I'm selecting algorithm at the moment. Don't require exact TopN result.
> Approximate algorithms with expected error are a lot more space saving and
> faster. My options are the below.
>
> 1. SpaceSaving [1] and its parallel version [2]
> 2. CountSketch [3]
>
> Seems SpaceSaving has both smaller footprint and is faster to calculate
> according to [1]. Does any one has experience could confirm? Or any
> suggestion on alternative algorithms?
>
>
> Cheers
> Yang
>
> [1] https://icmi.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf
> [2] http://arxiv.org/pdf/1401.0702.pdf
> [3]
> http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarCF.pdf
>
>

Reply via email to