Very interesting. I had saved your email from three years ago in hopes of
an elegant answer. Thanks for sharing!

Jim

On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> People keep asking me if we finally found a solution (even if this is 3+
> years old) so I will just update this thread with our findings.
>
> We finally achieved doing this thanks to our bigdata and reporting stacks
> by storing blobs corresponding to HLL (HyperLogLog) structures. HLL is an
> algorithm used by Google, twitter and many more to solve count-distinct
> problems. Structures built through this algorithm can be "summed" and give
> a good approximation of the UV number.
>
> Precision you will reach depends on the size of structure you chose
> (predictable precision). You can reach fairly acceptable approximation with
> small data structures.
>
> So we basically store a HLL per hour and just "sum" HLL for all the hours
> between 2 ranges (you can do it at day level or any other level depending
> on your needs).
>
> Hope this will help some of you, we finally had this (good) idea after
> more than 3 years. Actually we use HLL for a long time but the idea of
> storing HLL structures instead of counts allow us to request on custom
> ranges (at the price of more intelligence on the reporting stack that must
> read and smartly sum HLLs stored as blobs). We are happy with it since.
>
> C*heers,
>
> Alain
>
> 2012-01-19 22:21 GMT+01:00 Milind Parikh <milindpar...@gmail.com>:
>
>> You might want to look at the code in countandra.org; regardless of
>> whether you use it. It use a model of dynamic composite keys (although
>> static composite keys would have worked as well). For the actual query,only
>> one row is hit. This of course only works bc the data model is attuned for
>> the query.
>>
>> Regards
>> Milind
>>
>> /***********************
>> sent from my android...please pardon occasional typos as I respond @ the
>> speed of thought
>> ************************/
>>
>> On Jan 19, 2012 1:31 AM, "Alain RODRIGUEZ" <arodr...@gmail.com> wrote:
>>
>> Hi thanks for your answer but I don't want to add more layer on top of
>> Cassandra. I also have done all of my application without Countandra and I
>> would like to continue this way.
>>
>> Furthermore there is a Cassandra modeling problem that I would like to
>> solve, and not just hide.
>>
>> Alain
>>
>>
>>
>> 2012/1/18 Lucas de Souza Santos <lucas...@gmail.com>
>> >
>> > Why not http://www.countandra.org/
>> >
>> >
>> > ...
>>
>>
>

Reply via email to