liucao-dd commented on PR #93: URL: https://github.com/apache/cassandra-analytics/pull/93#issuecomment-3091351749
> TokenPartitioner > > Just following up on this one @anderoo, does this patch alone work for you? I would have thought there would be changes required to the `CassandraRing` and `TokenPartitioner` code, that is what is used to divide the token range into Spark tasks. > > @jberragan From local testing this patch worked, we've had to make other adjustments but nothing else related to the token ranges. For instance, if I log the ringEntries against a cluster with 3 nodes with 4 num_tokens each, I get the following: > > ``` > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-01, owns=100.00%, token=4825712696791974336, hostId=01 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-02, owns=100.00%, token=5421015065752373586, hostId=02 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-01, owns=100.00%, token=5901468578017553875, hostId=01 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-03, owns=100.00%, token=6332794502804570973, hostId=03 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-01, owns=100.00%, token=6637384270728447516, hostId=01 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-03, owns=100.00%, token=7047089967202231966, hostId=03 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-02, owns=100.00%, token=7298079395265359615, hostId=02 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-01, owns=100.00%, token=7716757078176049174, hostId=01 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-03, owns=100.00%, token=8164328418073929560, hostId=03 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-02, owns=100.00%, token=8467034318444373994, hostId=02 > 24/11/13 20:46:01 WARN CassandraDataLayer: Ring entry recorded: address=address-01, owns=100.00%, token=8960277083856641146, hostId=01 > ... > ``` > > This is because the sidecar endpoint returns an entry per token range rather than per host. Revisiting this after some integration testing on my end @anderoo @jberragan: I think this logic in RangeUtils may not always hold for virtual node setup https://github.com/apache/cassandra-analytics/blob/6e1d5257a8d6c910a42751475612145533ae3b1d/cassandra-analytics-common/src/main/java/org/apache/cassandra/spark/utils/RangeUtils.java#L158 This logic is based on https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html#consistent-hashing-using-a-token-ring where each instance holds a continuous range on the consistent hash ring. However, when we sort the List<instance> by the tokens, it is not guaranteed for the token allocation algorithm to result in a placement that matches this expectation. Therefore, I think it is safer to look up which token is owned by which physical node using information from List<instance>, and construct subRanges accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org