[ 
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367340#comment-14367340
 ] 

Benedict commented on CASSANDRA-8826:
-------------------------------------

I don't think we _fundamentally_ disagree. I guess I should outline what I am 
thinking of.

Initially, for single partition queries, but expanding to multiple partition 
queries, I would like our abstraction for aggregations to support partial 
results (continuations, effectively) that can be shipped around along with 
digests, and composed on the coordinator (or repaired). A different result 
would be returned for the repaired and the unrepaired portions from each owner, 
and combined on the coordinator. This permits us to answer these queries 
quickly in the common case where there is agreement, permits quick repair, and 
allows us to expand support to aggregations over multiple partitions without 
really tremendous difficult, by resolving each partition independently into its 
own partial computation, that are then combined with each of the other partial 
computations.

I don't pretend this is _simple_, but nor do I think it is prohibitively 
complex nor out of scope. It seems a good solution to all of the above 
problems, and permits us to easily push the construction of each _partial_ 
computation much lower into the stack when we have the time, so that this (the 
main body of work) can be done much more efficiently, and with network traffic 
proportional to the size of the result, not the domain.

The same abstraction can be used to implement sampled or exact, single or multi 
partition aggregations. Most crucially supporting them with repaired data, 
which we cannot do with any of our map/reduce connectors, and supporting them 
in "realtime"



> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is 
> pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. 
> Currently some related tickets (esp. CASSANDRA-8099) are currently in 
> progress - we should wait for them to land before talking about 
> implementation.
> Another playgrounds (not covered by this ticket), that might be related is 
> about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to