[
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367340#comment-14367340
]
Benedict commented on CASSANDRA-8826:
-------------------------------------
I don't think we _fundamentally_ disagree. I guess I should outline what I am
thinking of.
Initially, for single partition queries, but expanding to multiple partition
queries, I would like our abstraction for aggregations to support partial
results (continuations, effectively) that can be shipped around along with
digests, and composed on the coordinator (or repaired). A different result
would be returned for the repaired and the unrepaired portions from each owner,
and combined on the coordinator. This permits us to answer these queries
quickly in the common case where there is agreement, permits quick repair, and
allows us to expand support to aggregations over multiple partitions without
really tremendous difficult, by resolving each partition independently into its
own partial computation, that are then combined with each of the other partial
computations.
I don't pretend this is _simple_, but nor do I think it is prohibitively
complex nor out of scope. It seems a good solution to all of the above
problems, and permits us to easily push the construction of each _partial_
computation much lower into the stack when we have the time, so that this (the
main body of work) can be done much more efficiently, and with network traffic
proportional to the size of the result, not the domain.
The same abstraction can be used to implement sampled or exact, single or multi
partition aggregations. Most crucially supporting them with repaired data,
which we cannot do with any of our map/reduce connectors, and supporting them
in "realtime"
> Distributed aggregates
> ----------------------
>
> Key: CASSANDRA-8826
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Robert Stupp
> Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is
> pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient.
> Currently some related tickets (esp. CASSANDRA-8099) are currently in
> progress - we should wait for them to land before talking about
> implementation.
> Another playgrounds (not covered by this ticket), that might be related is
> about _distributed filtering_.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)