[
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609985#comment-15609985
]
Luke Brown commented on CASSANDRA-8826:
---------------------------------------
For CL>1, wouldn't read repair require shipping around the underlying data,
which this feature is intended to avoid doing? Would it still be worthwhile? If
it's important to the client that only aggregated results are sent between
nodes, I'm thinking that would rule out reconciliation for most aggregation
functions.
Because the queries would unpredictably produce network traffic comparable to
the current method of aggregating in the coordinator, right? When that happens,
the trade-off might even be considered a net performance loss given that the
queried nodes would all be running the aggregation functions too, rather than
just the coordinator.
If that's true, the most the coordinator should do for CL>1 distributed
aggregates would be to compare replica results, and any differences should just
fail the query without making any attempt to reconcile the underlying data (no
foreground or background repairs). For some applications, that fail-fast
alternative could be an improvement over CL.ONE & token-aware client, since the
coordinator would still choose the best >1 nodes to try--given the coordinator
is a better place to compare the multiple node responses than the client/driver.
But given that this special case would need its own additional implementation
for aggregates, would it still be considered a worthwhile feature?
> Distributed aggregates
> ----------------------
>
> Key: CASSANDRA-8826
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Robert Stupp
> Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is
> pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient.
> Currently some related tickets (esp. CASSANDRA-8099) are currently in
> progress - we should wait for them to land before talking about
> implementation.
> Another playgrounds (not covered by this ticket), that might be related is
> about _distributed filtering_.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)