liucao-dd commented on PR #93:
URL: 
https://github.com/apache/cassandra-analytics/pull/93#issuecomment-3091351749

   > TokenPartitioner
   
   
   
   > > Just following up on this one @anderoo, does this patch alone work for 
you? I would have thought there would be changes required to the 
`CassandraRing` and `TokenPartitioner` code, that is what is used to divide the 
token range into Spark tasks.
   > 
   > @jberragan From local testing this patch worked, we've had to make other 
adjustments but nothing else related to the token ranges. For instance, if I 
log the ringEntries against a cluster with 3 nodes with 4 num_tokens each, I 
get the following:
   > 
   > ```
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-01, owns=100.00%, token=4825712696791974336, hostId=01
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-02, owns=100.00%, token=5421015065752373586, hostId=02
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-01, owns=100.00%, token=5901468578017553875, hostId=01
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-03, owns=100.00%, token=6332794502804570973, hostId=03
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-01, owns=100.00%, token=6637384270728447516, hostId=01
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-03, owns=100.00%, token=7047089967202231966, hostId=03
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-02, owns=100.00%, token=7298079395265359615, hostId=02
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-01, owns=100.00%, token=7716757078176049174, hostId=01
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-03, owns=100.00%, token=8164328418073929560, hostId=03
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-02, owns=100.00%, token=8467034318444373994, hostId=02
   > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: 
address=address-01, owns=100.00%, token=8960277083856641146, hostId=01
   > ...
   > ```
   > 
   > This is because the sidecar endpoint returns an entry per token range 
rather than per host.
   
   Revisiting this after some integration testing on my end @anderoo @jberragan:
   I think this logic in RangeUtils may not always hold for virtual node setup 
https://github.com/apache/cassandra-analytics/blob/6e1d5257a8d6c910a42751475612145533ae3b1d/cassandra-analytics-common/src/main/java/org/apache/cassandra/spark/utils/RangeUtils.java#L158
   
   This logic is based on 
https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html#consistent-hashing-using-a-token-ring
 where each instance holds a continuous range on the consistent hash ring.
   
   However, when we sort the List<instance> by the tokens, it is not guaranteed 
for the token allocation algorithm to result in a placement that matches this 
expectation. Therefore, I think it is safer to look up which token is owned by 
which physical node using information from List<instance>, and construct 
subRanges accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to