The problem is one of state: avg can be computed with constant state,
even across multiple parts, while exact percentiles, like median,
require state proportional to the size of the input and must be
carefully implemented if results from several parts are to be
combined. Probabilistic solutions, as are often used in streaming
systems, are likely more applicable.


b

On Tue, Oct 23, 2012 at 11:46 AM, Michael Beauregard
<[email protected]> wrote:
> Is there a big performance difference between AVG and MEDIAN? I can
> certainly see that mathematically MEDIAN would be much slower
> considering that it has to sort, but I tested out some queries and
> didn't notice much difference, but maybe my dataset was too small to
> notice.
>
> I imagine that PERCENTILE would have nearly the same performance cost
> as MEDIAN...is that true?
>
> Michael
>
> On Tue, Oct 23, 2012 at 11:39 AM, K. John Wu <[email protected]> wrote:
>> Hi, Michael,
>>
>> This sounds like an interesting new aggregation function.  My time is
>> fully committed for the next few months, so I might not be able to
>> implement this feature.  If you can code up something, we'd be happy
>> to put it in FastBit..
>>
>> By the way, this aggregation similar to median would not be able to be
>> processed by parts.  Therefore this would be an expensive operator.
>>
>> John
>>
>>
>> On 10/23/12 10:47 AM, Michael Beauregard wrote:
>>> Hey John,
>>>
>>> I looked through the source code and didn't find support for a
>>> PERCENTILE (or similar) aggregation function, but it appears that it
>>> conceptually wouldn't be too hard to build one as a generalize form
>>> the existing MEDIAN aggregation. To validate my thinking with you, I
>>> imagine that PERCENTILE would take one additional argument indicating
>>> what percentile to return. Unless I'm mistaken, it seems that MEDIAN
>>> could then be implemented as PERCENTILE(50, <columns>) internally.
>>>
>>> What are your thoughts on this feature?
>>>
>>> Michael
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to