[ 
https://issues.apache.org/jira/browse/JENA-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808682#comment-16808682
 ] 

Andy Seaborne commented on JENA-1693:
-------------------------------------

[~ajs6f] A good place to look would be the current aggregator like COUNT. If it 
is a keyword in the language, it's not a custom extension.

There is support for custom aggregators, see {{AggCustom}}, it needs Java to 
register code IIRC. It shoudl work but I don't think it gets used.

As a statistic, MEDIAN is not like AVG and STDDEV. They are streaming 
accumulations.  I don't know of an algorithm for MEDIAN that avoids a sort - 
maybe there is such an algorithm that is not a full sort and retains only 
partial intermediates - but retaining some values while waiting all values 
seems to be necessary. So this aggregator will take memory to evaluate.

Investigation needed!



> Add Aggregate Function MEDIAN To SPARQL ARQ Syntax
> --------------------------------------------------
>
>                 Key: JENA-1693
>                 URL: https://issues.apache.org/jira/browse/JENA-1693
>             Project: Apache Jena
>          Issue Type: New Feature
>         Environment: general 
>  
>            Reporter: Marco Neumann
>            Priority: Minor
>              Labels: features
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> As briefly mentioned to Andy Seaborne I'd like to see the aggregate function 
> MEDIAN in the ARQ SPARQL syntax. 
> "Median is the value that separates lower half from the higher half when the 
> values are ordered in ascending or descending order. It is the middle value 
> in a given dataset. Medians are helpful in understanding the distribution of 
> data. This can be done by comparing mean and median values. By observing the 
> difference between these values we can understand whether the data is left 
> skewed or right skewed. The formula for median is: Median = ((n + 1)/2) th 
> number in the series where the numbers are ordered. Here, n denotes the 
> number of values for the given variable."
> DIVYA SPANDANA MARNEN, SPARQL-R: EXTENDED SPARQL FOR STATISTICAL COMPUTATIONS.
>  
> example
>  
> SELECT agg:median(?age) AS ?median
> WHERE
> { ?x ex:age ?age }
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to