The computation of AVG can be done with one pass through whatever data
records are selected, this operation can be essentially down with as
little data records in memory as possible.  In contrast, the operation
MEDIAN requires storing all the selected data records and then sorting
the values.  The sorting process could be considered as going through
the data in log(N) passes.  Clearly, the difference would be larger if
you have a larger data set to start with.

John


On 10/23/12 11:46 AM, Michael Beauregard wrote:
> Is there a big performance difference between AVG and MEDIAN? I can
> certainly see that mathematically MEDIAN would be much slower
> considering that it has to sort, but I tested out some queries and
> didn't notice much difference, but maybe my dataset was too small to
> notice.
> 
> I imagine that PERCENTILE would have nearly the same performance cost
> as MEDIAN...is that true?
> 
> Michael
> 
> On Tue, Oct 23, 2012 at 11:39 AM, K. John Wu <[email protected]> wrote:
>> Hi, Michael,
>>
>> This sounds like an interesting new aggregation function.  My time is
>> fully committed for the next few months, so I might not be able to
>> implement this feature.  If you can code up something, we'd be happy
>> to put it in FastBit..
>>
>> By the way, this aggregation similar to median would not be able to be
>> processed by parts.  Therefore this would be an expensive operator.
>>
>> John
>>
>>
>> On 10/23/12 10:47 AM, Michael Beauregard wrote:
>>> Hey John,
>>>
>>> I looked through the source code and didn't find support for a
>>> PERCENTILE (or similar) aggregation function, but it appears that it
>>> conceptually wouldn't be too hard to build one as a generalize form
>>> the existing MEDIAN aggregation. To validate my thinking with you, I
>>> imagine that PERCENTILE would take one additional argument indicating
>>> what percentile to return. Unless I'm mistaken, it seems that MEDIAN
>>> could then be implemented as PERCENTILE(50, <columns>) internally.
>>>
>>> What are your thoughts on this feature?
>>>
>>> Michael
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to