[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

Sylvain Lebresne (JIRA) Wed, 18 Mar 2015 08:17:17 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367281#comment-14367281
 ]


Sylvain Lebresne commented on CASSANDRA-8826:
---------------------------------------------

Maybe I can be a bit more precise cause I'm not ever sure we fundamentally 
disagree. If you're talking about optimizing aggregates over a single 
partition, even a reasonably large one, then I'm fine with that in principle.  
But to me, "distributed aggregates" refers to distributing aggregates over 
large quantity of data over many nodes _à la_ map-reduce. That's not 
particularly real time in my book btw and I maintain that imo that's exactly 
what Spark/hadoop are about and there is no point in reinventing that wheel.

Now, if we are talking about single partition aggregates, then the only 
relation with this ticket I can see is to push the aggregate on replicas to 
save cross-node traffics. We know it's not that that easy for CL > CL.ONE, and 
for CL.ONE, I think it's fine to assume that clients do token aware routing, at 
which point we already do no transfer data over the wire (and CASSANDRA-7168 
will indeed help improve higher CL quite a bit, even without any change to the 
current implementation). And I'm just not sure it's worth putting too much 
effort short term to optimize the "CL.ONE but no token-aware routing" case.


> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is 
> pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. 
> Currently some related tickets (esp. CASSANDRA-8099) are currently in 
> progress - we should wait for them to land before talking about 
> implementation.
> Another playgrounds (not covered by this ticket), that might be related is 
> about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

Reply via email to