[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

Luke Brown (JIRA) Wed, 26 Oct 2016 16:08:23 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609985#comment-15609985
 ]


Luke Brown commented on CASSANDRA-8826:
---------------------------------------

For CL>1, wouldn't read repair require shipping around the underlying data, 
which this feature is intended to avoid doing? Would it still be worthwhile? If 
it's important to the client that only aggregated results are sent between 
nodes, I'm thinking that would rule out reconciliation for most aggregation 
functions.

Because the queries would unpredictably produce network traffic comparable to 
the current method of aggregating in the coordinator, right? When that happens, 
the trade-off might even be considered a net performance loss given that the 
queried nodes would all be running the aggregation functions too, rather than 
just the coordinator.

If that's true, the most the coordinator should do for CL>1 distributed 
aggregates would be to compare replica results, and any differences should just 
fail the query without making any attempt to reconcile the underlying data (no 
foreground or background repairs). For some applications, that fail-fast 
alternative could be an improvement over CL.ONE & token-aware client, since the 
coordinator would still choose the best >1 nodes to try--given the coordinator 
is a better place to compare the multiple node responses than the client/driver.

But given that this special case would need its own additional implementation 
for aggregates, would it still be considered a worthwhile feature?

> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is 
> pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. 
> Currently some related tickets (esp. CASSANDRA-8099) are currently in 
> progress - we should wait for them to land before talking about 
> implementation.
> Another playgrounds (not covered by this ticket), that might be related is 
> about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

Reply via email to