Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Jon Haddad
LWT != Last Write Wins.  They are totally different.  

LWTs give you (assuming you also read at SERIAL) “atomic consistency”, meaning 
you are able to perform operations atomically and in isolation.  That’s the 
safety blanket everyone wants but is extremely expensive, especially in 
Cassandra.  The lightweight part, btw, may be a little optimistic, especially 
if a key is under contention.  With regard to the “last write” part you’re 
asking about - w/ LWT Cassandra provides the timestamp and manages it as part 
of the ballot, and it always is increasing.  See 
org.apache.cassandra.service.ClientState#getTimestampForPaxos.  From the code:

 * Returns a timestamp suitable for paxos given the timestamp of the last known 
commit (or in progress update).
 * Paxos ensures that the timestamp it uses for commits respects the serial 
order of those commits. It does so
 * by having each replica reject any proposal whose timestamp is not strictly 
greater than the last proposal it
 * accepted. So in practice, which timestamp we use for a given proposal 
doesn't affect correctness but it does
 * affect the chance of making progress (if we pick a timestamp lower than what 
has been proposed before, our
 * new proposal will just get rejected).

Effectively paxos removes the ability to use custom timestamps and addresses 
clock variance by rejecting ballots with timestamps less than what was last 
seen.  You can learn more by reading through the other comments and code in 
that file. 

Last write wins is a free for all that guarantees you *nothing* except the 
timestamp is used as a tiebreaker.  Here we acknowledge things like the speed 
of light as being a real problem that isn’t going away anytime soon.  This 
problem is sometimes addressed with event sourcing rather than mutating in 
place.

Hope this helps.

Jon


> On Feb 9, 2017, at 5:21 PM, Kant Kodali  wrote:
> 
> @Justin I read this article 
> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 
> . 
> And it clearly says Linearizable consistency can be achieved with LWT's.  so 
> should I assume the Linearizability in the context of the above article is 
> possible with LWT's and synchronization of clocks through ntpd ? because 
> LWT's also follow Last Write Wins. isn't it? Also another question does most 
> of the production clusters do setup ntpd? If so what is the time it takes to 
> sync? any idea
> 
> @Micheal Schuler Are you referring to  something like true time as in 
> https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
>  
> ?
>   Actually I never heard of setting up GPS modules and how that can be 
> helpful. Let me research on that but good point.
> 
> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler  > wrote:
> If you require the best precision you can get, setting up a pair of
> stratum 1 ntpd masters in each data center location with a GPS modules
> is not terribly complex. Low latency and jitter on servers you manage.
> 140ms is a long way away network-wise, and I would suggest that was a
> poor choice of upstream (probably stratum 2 or 3) source.
> 
> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
> need as close as you can get, you'll probably need to do it yourself.
> 
> (I run several stratum 2 ntpd servers for pool.ntp.org )
> 
> --
> Kind regards,
> Michael
> 
> On 02/09/2017 06:47 PM, Kant Kodali wrote:
> > Hi Justin,
> >
> > There are bunch of issues w.r.t to synchronization of clocks when we
> > used ntpd. Also the time it took to sync the clocks was approx 140ms
> > (don't quote me on it though because it is reported by our devops :)
> >
> > we have multiple clients (for example bunch of micro services are
> > reading from Cassandra) I am not sure how one can achieve
> > Linearizability by setting timestamps on the clients ? since there is no
> > total ordering across multiple clients.
> >
> > Thanks!
> >
> >
> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron  > 
> > >> wrote:
> >
> > Hi Kant,
> >
> > Clock synchronization is important - you should ensure that ntpd is
> > properly configured on all nodes. If your particular use case is
> > especially sensitive to out-of-order mutations it is possible to set
> > timestamps on the client side using the
> > drivers. 
> > https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/
> >  
> > 
> > 
> >  >  
> 

Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Michael Shuler
On 02/09/2017 07:21 PM, Kant Kodali wrote:
> @Justin I read this article
> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0.
> And it clearly says Linearizable consistency can be achieved with LWT's.
>  so should I assume the Linearizability in the context of the above
> article is possible with LWT's and synchronization of clocks through
> ntpd ? because LWT's also follow Last Write Wins. isn't it? Also another
> question does most of the production clusters do setup ntpd? If so what
> is the time it takes to sync? any idea

I'll let the others talk more intimately about LWT, but as for NTP, the
client machines do take some time to incrementally settle time
adjustments to meet up with the upstreams - they don't just jump time.

> @Micheal Schuler Are you referring to  something like true time as in
> https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf?
>  
> Actually I never heard of setting up GPS modules and how that can be
> helpful. Let me research on that but good point.

Nah, I'm talking much simpler. For instance you could do this with a
raspberry pi:
http://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html

-- 
Michael

> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler  > wrote:
> 
> If you require the best precision you can get, setting up a pair of
> stratum 1 ntpd masters in each data center location with a GPS modules
> is not terribly complex. Low latency and jitter on servers you manage.
> 140ms is a long way away network-wise, and I would suggest that was a
> poor choice of upstream (probably stratum 2 or 3) source.
> 
> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
> need as close as you can get, you'll probably need to do it yourself.
> 
> (I run several stratum 2 ntpd servers for pool.ntp.org
> )
> 
> --
> Kind regards,
> Michael
> 
> On 02/09/2017 06:47 PM, Kant Kodali wrote:
> > Hi Justin,
> >
> > There are bunch of issues w.r.t to synchronization of clocks when we
> > used ntpd. Also the time it took to sync the clocks was approx 140ms
> > (don't quote me on it though because it is reported by our devops :)
> >
> > we have multiple clients (for example bunch of micro services are
> > reading from Cassandra) I am not sure how one can achieve
> > Linearizability by setting timestamps on the clients ? since there is no
> > total ordering across multiple clients.
> >
> > Thanks!
> >
> >
> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron  
> > >> wrote:
> >
> > Hi Kant,
> >
> > Clock synchronization is important - you should ensure that ntpd is
> > properly configured on all nodes. If your particular use case is
> > especially sensitive to out-of-order mutations it is possible to set
> > timestamps on the client side using the
> > drivers. 
> https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/
> 
> 
> > 
>  
> >
> >
> > We use our own NTP cluster to reduce clock drift as much as
> > possible, but public NTP servers are good enough for most
> > uses. 
> https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
> 
> 
> > 
>  
> >
> >
> > Cheers,
> > Justin
> >
> > On Thu, 9 Feb 2017 at 16:09 Kant Kodali  
> > >> wrote:
> >
> > How does Cassandra achieve Linearizability with “Last write
> > wins” (conflict resolution methods based on time-of-day clocks) 
> ?
> >
> > Relying on synchronized clocks are almost certainly
> > non-linearizable, because clock timestamps cannot be guaranteed
> > to be consistent with actual event ordering due to clock skew.
> > isn't it?
> >
> > Thanks!
> >
> > --
> >
> > Justin Cameron
> >
> > Senior Software Engineer | Instaclustr
> >
> >
> >
> >
> > This email has been sent on behalf of Instaclustr Pty Ltd
> > (Australia) and Instaclustr Inc 

Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Kant Kodali
@Justin I read this article
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0.
And it clearly says Linearizable consistency can be achieved with LWT's.
 so should I assume the Linearizability in the context of the above article
is possible with LWT's and synchronization of clocks through ntpd ? because
LWT's also follow Last Write Wins. isn't it? Also another question does
most of the production clusters do setup ntpd? If so what is the time it
takes to sync? any idea

@Micheal Schuler Are you referring to  something like true time as in
https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf?
Actually I never heard of setting up GPS modules and how that can be
helpful. Let me research on that but good point.

On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler 
wrote:

> If you require the best precision you can get, setting up a pair of
> stratum 1 ntpd masters in each data center location with a GPS modules
> is not terribly complex. Low latency and jitter on servers you manage.
> 140ms is a long way away network-wise, and I would suggest that was a
> poor choice of upstream (probably stratum 2 or 3) source.
>
> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
> need as close as you can get, you'll probably need to do it yourself.
>
> (I run several stratum 2 ntpd servers for pool.ntp.org)
>
> --
> Kind regards,
> Michael
>
> On 02/09/2017 06:47 PM, Kant Kodali wrote:
> > Hi Justin,
> >
> > There are bunch of issues w.r.t to synchronization of clocks when we
> > used ntpd. Also the time it took to sync the clocks was approx 140ms
> > (don't quote me on it though because it is reported by our devops :)
> >
> > we have multiple clients (for example bunch of micro services are
> > reading from Cassandra) I am not sure how one can achieve
> > Linearizability by setting timestamps on the clients ? since there is no
> > total ordering across multiple clients.
> >
> > Thanks!
> >
> >
> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron  > > wrote:
> >
> > Hi Kant,
> >
> > Clock synchronization is important - you should ensure that ntpd is
> > properly configured on all nodes. If your particular use case is
> > especially sensitive to out-of-order mutations it is possible to set
> > timestamps on the client side using the
> > drivers. https://docs.datastax.com/en/developer/java-driver/3.1/
> manual/query_timestamps/
> >  manual/query_timestamps/>
> >
> > We use our own NTP cluster to reduce clock drift as much as
> > possible, but public NTP servers are good enough for most
> > uses. https://www.instaclustr.com/blog/2015/11/05/apache-
> cassandra-synchronization/
> >  cassandra-synchronization/>
> >
> > Cheers,
> > Justin
> >
> > On Thu, 9 Feb 2017 at 16:09 Kant Kodali  > > wrote:
> >
> > How does Cassandra achieve Linearizability with “Last write
> > wins” (conflict resolution methods based on time-of-day clocks) ?
> >
> > Relying on synchronized clocks are almost certainly
> > non-linearizable, because clock timestamps cannot be guaranteed
> > to be consistent with actual event ordering due to clock skew.
> > isn't it?
> >
> > Thanks!
> >
> > --
> >
> > Justin Cameron
> >
> > Senior Software Engineer | Instaclustr
> >
> >
> >
> >
> > This email has been sent on behalf of Instaclustr Pty Ltd
> > (Australia) and Instaclustr Inc (USA).
> >
> > This email and any attachments may contain confidential and legally
> > privileged information.  If you are not the intended recipient, do
> > not copy or disclose its content, but please reply to this email
> > immediately and highlight the error to the sender and then
> > immediately delete the message.
> >
> >
>
>


Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Michael Shuler
If you require the best precision you can get, setting up a pair of
stratum 1 ntpd masters in each data center location with a GPS modules
is not terribly complex. Low latency and jitter on servers you manage.
140ms is a long way away network-wise, and I would suggest that was a
poor choice of upstream (probably stratum 2 or 3) source.

As Jonathan mentioned, there's no guarantee from Cassandra, but if you
need as close as you can get, you'll probably need to do it yourself.

(I run several stratum 2 ntpd servers for pool.ntp.org)

-- 
Kind regards,
Michael

On 02/09/2017 06:47 PM, Kant Kodali wrote:
> Hi Justin,
> 
> There are bunch of issues w.r.t to synchronization of clocks when we
> used ntpd. Also the time it took to sync the clocks was approx 140ms
> (don't quote me on it though because it is reported by our devops :) 
> 
> we have multiple clients (for example bunch of micro services are
> reading from Cassandra) I am not sure how one can achieve
> Linearizability by setting timestamps on the clients ? since there is no
> total ordering across multiple clients.
> 
> Thanks!
> 
> 
> On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron  > wrote:
> 
> Hi Kant,
> 
> Clock synchronization is important - you should ensure that ntpd is
> properly configured on all nodes. If your particular use case is
> especially sensitive to out-of-order mutations it is possible to set
> timestamps on the client side using the
> drivers. 
> https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/
> 
> 
> 
> We use our own NTP cluster to reduce clock drift as much as
> possible, but public NTP servers are good enough for most
> uses. 
> https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
> 
> 
> 
> Cheers,
> Justin
> 
> On Thu, 9 Feb 2017 at 16:09 Kant Kodali  > wrote:
> 
> How does Cassandra achieve Linearizability with “Last write
> wins” (conflict resolution methods based on time-of-day clocks) ?
> 
> Relying on synchronized clocks are almost certainly
> non-linearizable, because clock timestamps cannot be guaranteed
> to be consistent with actual event ordering due to clock skew.
> isn't it?
> 
> Thanks!
> 
> -- 
> 
> Justin Cameron
> 
> Senior Software Engineer | Instaclustr
> 
> 
> 
> 
> This email has been sent on behalf of Instaclustr Pty Ltd
> (Australia) and Instaclustr Inc (USA).
> 
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do
> not copy or disclose its content, but please reply to this email
> immediately and highlight the error to the sender and then
> immediately delete the message.
> 
> 



Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Justin Cameron
I think the answer to that question will depend on your specific use case
and requirements.

If you're only doing a small number of updates but need to be sure they are
applied in order you may be able to use lightweight transactions (keep in
mind there's a performance hit here, so it's not an answer for high-volume
mutations).

For high-volume updates you could look at using an append-only time-series
style data model, using a default TTL to drop old data.

If your data isn't time-series in nature and has a high-volume of updates
then you really just need to make sure either your clients or Cassandra
nodes (preferably both) are in sync.

Justin

On Thu, 9 Feb 2017 at 16:47 Kant Kodali  wrote:

> Hi Justin,
>
> There are bunch of issues w.r.t to synchronization of clocks when we used
> ntpd. Also the time it took to sync the clocks was approx 140ms (don't
> quote me on it though because it is reported by our devops :)
>
> we have multiple clients (for example bunch of micro services are reading
> from Cassandra) I am not sure how one can achieve Linearizability by
> setting timestamps on the clients ? since there is no total ordering across
> multiple clients.
>
> Thanks!
>
>
> On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron 
> wrote:
>
> Hi Kant,
>
> Clock synchronization is important - you should ensure that ntpd is
> properly configured on all nodes. If your particular use case is especially
> sensitive to out-of-order mutations it is possible to set timestamps on the
> client side using the drivers.
> https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/
>
> We use our own NTP cluster to reduce clock drift as much as possible, but
> public NTP servers are good enough for most uses.
> https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/
>
> Cheers,
> Justin
>
> On Thu, 9 Feb 2017 at 16:09 Kant Kodali  wrote:
>
> How does Cassandra achieve Linearizability with “Last write wins”
> (conflict resolution methods based on time-of-day clocks) ?
>
> Relying on synchronized clocks are almost certainly non-linearizable,
> because clock timestamps cannot be guaranteed to be consistent with actual
> event ordering due to clock skew. isn't it?
>
> Thanks!
>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> --

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Jonathan Haddad
It doesn't, nor does it claim to.

On Thu, Feb 9, 2017 at 4:09 PM Kant Kodali  wrote:

> How does Cassandra achieve Linearizability with “Last write wins”
> (conflict resolution methods based on time-of-day clocks) ?
>
> Relying on synchronized clocks are almost certainly non-linearizable,
> because clock timestamps cannot be guaranteed to be consistent with actual
> event ordering due to clock skew. isn't it?
>
> Thanks!
>


Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Kant Kodali
Hi Justin,

There are bunch of issues w.r.t to synchronization of clocks when we used
ntpd. Also the time it took to sync the clocks was approx 140ms (don't
quote me on it though because it is reported by our devops :)

we have multiple clients (for example bunch of micro services are reading
from Cassandra) I am not sure how one can achieve Linearizability by
setting timestamps on the clients ? since there is no total ordering across
multiple clients.

Thanks!


On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron 
wrote:

> Hi Kant,
>
> Clock synchronization is important - you should ensure that ntpd is
> properly configured on all nodes. If your particular use case is especially
> sensitive to out-of-order mutations it is possible to set timestamps on the
> client side using the drivers. https://docs.datastax.com/en/developer/
> java-driver/3.1/manual/query_timestamps/
>
> We use our own NTP cluster to reduce clock drift as much as possible, but
> public NTP servers are good enough for most uses. https://www.instaclustr.
> com/blog/2015/11/05/apache-cassandra-synchronization/
>
> Cheers,
> Justin
>
> On Thu, 9 Feb 2017 at 16:09 Kant Kodali  wrote:
>
>> How does Cassandra achieve Linearizability with “Last write wins”
>> (conflict resolution methods based on time-of-day clocks) ?
>>
>> Relying on synchronized clocks are almost certainly non-linearizable,
>> because clock timestamps cannot be guaranteed to be consistent with actual
>> event ordering due to clock skew. isn't it?
>>
>> Thanks!
>>
> --
>
> Justin Cameron
>
> Senior Software Engineer | Instaclustr
>
>
>
>
> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
> Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>


Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Justin Cameron
Hi Kant,

Clock synchronization is important - you should ensure that ntpd is
properly configured on all nodes. If your particular use case is especially
sensitive to out-of-order mutations it is possible to set timestamps on the
client side using the drivers.
https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/

We use our own NTP cluster to reduce clock drift as much as possible, but
public NTP servers are good enough for most uses.
https://www.instaclustr.com/blog/2015/11/05/apache-cassandra-synchronization/

Cheers,
Justin

On Thu, 9 Feb 2017 at 16:09 Kant Kodali  wrote:

> How does Cassandra achieve Linearizability with “Last write wins”
> (conflict resolution methods based on time-of-day clocks) ?
>
> Relying on synchronized clocks are almost certainly non-linearizable,
> because clock timestamps cannot be guaranteed to be consistent with actual
> event ordering due to clock skew. isn't it?
>
> Thanks!
>
-- 

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


How does cassandra achieve Linearizability?

2017-02-09 Thread Kant Kodali
How does Cassandra achieve Linearizability with “Last write wins” (conflict
resolution methods based on time-of-day clocks) ?

Relying on synchronized clocks are almost certainly non-linearizable,
because clock timestamps cannot be guaranteed to be consistent with actual
event ordering due to clock skew. isn't it?

Thanks!


If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-09 Thread Kant Kodali
If reading from materialized view with a consistency level of quorum am I
guaranteed to have the most recent view? other words is w + r > n contract
maintained for MV's as well for both reads and writes?

Thanks!


Re: Composite partition key token

2017-02-09 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Works great, thank you!

On 2/9/17, 6:26 AM, "Michael Burman"  wrote:

Hi,

How about taking it from the BoundStatement directly?

ByteBuffer routingKey = 
b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED, codecRegistry);
Token token = metadata.newToken(routingKey);

In this case the b is the "BoundStatement". Replace codecRegistry & 
ProtocolVersion with what you have. codecRegistry for example from the 
codecRegistry = session.getCluster().getConfiguration().getCodecRegistry();

   - Micke


On 02/08/2017 08:58 PM, Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) wrote:
>
> Hi,
>
> I would like to ask how to calculate token for composite partition key 
> using java api?
>
> For partition key made of one column I use 
> cluster.getMetadata().newToken(newBuffer);
>
> But what if my key looks like this PRIMARY KEY 
> ((parentResourceId,timeRT), childName)?
>
> I read that “:” is a separator but it doesn’t seem to be the case.
>
> How can I create ByteBuffer with multiple values so that the token 
> would be actually correct?
>
> Thank you,
>
> Branislav
>





Re: Authentication with Java driver

2017-02-09 Thread Ben Bromhead
If the processes are launched separately or you fork before setting up the
cluster object it won't share credentials.

On Wed, Feb 8, 2017, 02:33 Yuji Ito  wrote:

> Thanks Ben,
>
> Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?
>
>
> Lots of instances of the process are generated.
> I wanted to confirm that `other` doesn't authenticate.
>
> If I want to avoid that, my application has to create new cluster/session
> objects per instance.
> But it is inefficient and uncommon.
> So, we aren't sure that the application works when a lot of
> cluster/session objects are created.
> Is it correct?
>
> Thank you,
> Yuji
>
>
>
> On Wed, Feb 8, 2017 at 12:01 PM, Ben Bromhead  wrote:
>
> On Tue, 7 Feb 2017 at 17:52 Yuji Ito  wrote:
>
> Thanks Andrew, Ben,
>
> My application creates a lot of instances connecting to Cassandra with
> basically the same set of credentials.
>
> Do you mean lots of instances of the process or lots of instances of the
> cluster/session object?
>
>
> After an instance connects to Cassandra with the credentials, can any
> instance connect to Cassandra without credentials?
>
> As long as you don't share the session or cluster objects. Each new
> cluster/session will need to reauthenticate.
>
>
> == example ==
> A first = new A("database", "user", "password");  // proper credentials
> r = first.get();
> ...
> A other = new A("database", "user", "pass"); // wrong password
> r = other.get();
> == example ==
>
> I want to refuse the `other` instance with improper credentials.
>
>
> This looks like you are creating new cluster/session objects (filling in
> the blanks for your pseudocode here). So "other" will not authenticate to
> Cassandra.
>
> This brings up a wider point of why you are doing this? Generally most
> applications will create a single longed lived session object that lasts
> the life of the application process.
>
> I would not rely on Cassandra auth to authenticate downstream actors, not
> because it's bad, just its generally inefficient to create lots of session
> objects. The session object maintains a connection pool, pipelines
> requests, is thread safe and generally pretty solid.
>
>
>
>
> Yuji
>
>
> On Wed, Feb 8, 2017 at 4:11 AM, Ben Bromhead  wrote:
>
> What are you specifically trying to achieve? Are you trying to
> authenticate multiple Cassandra users from a single application instance?
> Or will your have lot's of application instances connecting to Cassandra
> using the same set of credentials? Or a combination of both? Multiple
> application instances with different credentials?
>
> On Tue, 7 Feb 2017 at 06:19 Andrew Tolbert 
> wrote:
>
> Hello,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
>
> With the datastax driver, Session is what manages connection pools to
> each node.  Cluster manages configuration and a separate connection
> ('control connection') to subscribe to state changes (schema changes, node
> topology changes, node up/down events).
>
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
>
>
> I'm unsure how common it is for per-user authentication to be done when
> connecting to the database.  I think an application would normally
> authenticate with one set of credentials instead of multiple.  The protocol
> Cassandra uses does authentication at the connection level instead of at
> the request level, so that is currently a limitation to support something
> like reusing Sessions for authenticating multiple users.
>
> Thanks,
> Andy
>
>
> On Tue, Feb 7, 2017 at 7:19 AM Hiroyuki Yamada  wrote:
>
> Hi,
>
> The API seems kind of not correct because credentials should be
> usually set with a session but actually they are set with a cluster.
>
> So, if there are 1000 clients, then with this API it has to create
> 1000 cluster instances ?
> 1000 clients seems usual if there are many nodes (say 20) and each
> node has some concurrency (say 50),
> but 1000 cluster instances seems too many.
>
> Is this an expected way to do this ? or
> Is there any way to authenticate per session ?
>
> Thanks,
> Hiro
>
> On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito  wrote:
> > Hi all,
> >
> > I want to know how to authenticate Cassandra users for multiple instances
> > with Java driver.
> > For instance, each thread creates a instance to access Cassandra with
> > authentication.
> >
> > As the implementation example, only the first constructor builds a
> cluster
> > and a session.
> > Other constructors use them.
> > This example is implemented according to the datastax document:
> "Basically
> > you will want to share the same cluster and session instances across your
> > application".
> >
> 

Re: Composite partition key token

2017-02-09 Thread Edward Capriolo
On Thu, Feb 9, 2017 at 9:26 AM, Michael Burman  wrote:

> Hi,
>
> How about taking it from the BoundStatement directly?
>
> ByteBuffer routingKey = b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED,
> codecRegistry);
> Token token = metadata.newToken(routingKey);
>
> In this case the b is the "BoundStatement". Replace codecRegistry &
> ProtocolVersion with what you have. codecRegistry for example from the
> codecRegistry = session.getCluster().getConfig
> uration().getCodecRegistry();
>
>   - Micke
>
>
> On 02/08/2017 08:58 PM, Branislav Janosik -T (bjanosik - AAP3 INC at
> Cisco) wrote:
>
>>
>> Hi,
>>
>> I would like to ask how to calculate token for composite partition key
>> using java api?
>>
>> For partition key made of one column I use cluster.getMetadata().newToken
>> (newBuffer);
>>
>> But what if my key looks like this PRIMARY KEY
>> ((parentResourceId,timeRT), childName)?
>>
>> I read that “:” is a separator but it doesn’t seem to be the case.
>>
>> How can I create ByteBuffer with multiple values so that the token would
>> be actually correct?
>>
>> Thank you,
>>
>> Branislav
>>
>>
>
This could help:
https://github.com/edwardcapriolo/simple-cassandra-tools/blob/master/src/main/java/io/teknek/cassandra/simple/CompositeTool.java


Re: Composite partition key token

2017-02-09 Thread Michael Burman

Hi,

How about taking it from the BoundStatement directly?

ByteBuffer routingKey = 
b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED, codecRegistry);

Token token = metadata.newToken(routingKey);

In this case the b is the "BoundStatement". Replace codecRegistry & 
ProtocolVersion with what you have. codecRegistry for example from the 
codecRegistry = session.getCluster().getConfiguration().getCodecRegistry();


  - Micke


On 02/08/2017 08:58 PM, Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) wrote:


Hi,

I would like to ask how to calculate token for composite partition key 
using java api?


For partition key made of one column I use 
cluster.getMetadata().newToken(newBuffer);


But what if my key looks like this PRIMARY KEY 
((parentResourceId,timeRT), childName)?


I read that “:” is a separator but it doesn’t seem to be the case.

How can I create ByteBuffer with multiple values so that the token 
would be actually correct?


Thank you,

Branislav





Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
Ok now I REALLY got it :)
Thanks Sylvain!

2017-02-09 11:42 GMT+01:00 Sylvain Lebresne :

> On Thu, Feb 9, 2017 at 10:52 AM, Benjamin Roth 
> wrote:
>
>> Ok got it.
>>
>> But it's interesting that this is supported:
>> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>>
>> This is technically mostly the same (Token awareness,
>> coordination/routing, read performance, ...), right?
>>
>
> It is. That's what I meant by "there is something to be said for the
> consistency of the CQL language in general". In other words, look for no
> externally logical reason for this being unsupported, it's unsupported
> simply due to how the CQL code evolved. But as I said, we didn't fix that
> inconsistency because we're all busy and it's not really that important in
> practice. The project of course welcome any contributions though :)
>
>
>>
>> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne :
>>
>>> This is a statement on multiple partitions and there is really no
>>> optimization the code internally does on that. In fact, I strongly advise
>>> you to not use a batch but rather simply do a for loop client side and send
>>> statement individually. That way, your driver will be able to use proper
>>> token-awareness for each request (while if you send a batch, one
>>> coordinator will be picked up and will have to forward most statement,
>>> doing more network hops at the end of the day). The only case where using a
>>> batch is indeed legit is if you care about all the statement being atomic,
>>> but in that case it's a logged batch you want.
>>>
>>> That's btw more or less why we never bothered implementing that: it's
>>> totally doable technically, but it's not really such a good idea
>>> performance wise in practice most of the time, and you can easily work it
>>> around with a batch if you need atomicity.
>>>
>>> Which is not saying it will never be and shouldn't be supported btw,
>>> there is something to be said for the consistency of the CQL language in
>>> general. But it's why no-one took time to do it so far.
>>>
>>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
>>> wrote:
>>>
 Yes, thats the workaround - I'll try that.

 Would you agree it would be better for internal optimizations to
 process this within a single statement?

 2017-02-09 10:32 GMT+01:00 Ben Slater :

> Yep, that makes it clear. I think an unlogged batch of prepared
> statements with one statement per PK tuple would be roughly equivalent? 
> And
> probably no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
> wrote:
>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2),
>> (1, 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance
>> penalty, it is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of 
>> records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161
>> 304880-1 <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161
>> 304880-1 <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
This doesn't really belong to this topic but I also experienced what Ben
says.
I was migrating (and still am) tons of data from MySQL to CS. I measured
several approached (async parallel, prepared stmt, sync with unlogged
batches) and it turned out that batches where really fast and produced less
problems with cluster overloading with MVs.

2017-02-09 11:28 GMT+01:00 Ben Slater :

> That’s a very good point from Sylvain that I forgot/missed. That said,
> we’ve seen plenty of scenarios where overall system throughput is improved
> through unlogged batches. One of my colleagues did quite a bit of
> benchmarking on this topic for his talk at last year’s C* summit:
> http://www.slideshare.net/DataStax/microbatching-
> highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016
>
> On Thu, 9 Feb 2017 at 20:52 Benjamin Roth  wrote:
>
>> Ok got it.
>>
>> But it's interesting that this is supported:
>> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>>
>> This is technically mostly the same (Token awareness,
>> coordination/routing, read performance, ...), right?
>>
>> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne :
>>
>> This is a statement on multiple partitions and there is really no
>> optimization the code internally does on that. In fact, I strongly advise
>> you to not use a batch but rather simply do a for loop client side and send
>> statement individually. That way, your driver will be able to use proper
>> token-awareness for each request (while if you send a batch, one
>> coordinator will be picked up and will have to forward most statement,
>> doing more network hops at the end of the day). The only case where using a
>> batch is indeed legit is if you care about all the statement being atomic,
>> but in that case it's a logged batch you want.
>>
>> That's btw more or less why we never bothered implementing that: it's
>> totally doable technically, but it's not really such a good idea
>> performance wise in practice most of the time, and you can easily work it
>> around with a batch if you need atomicity.
>>
>> Which is not saying it will never be and shouldn't be supported btw,
>> there is something to be said for the consistency of the CQL language in
>> general. But it's why no-one took time to do it so far.
>>
>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
>> wrote:
>>
>> Yes, thats the workaround - I'll try that.
>>
>> Would you agree it would be better for internal optimizations to process
>> this within a single statement?
>>
>> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>>
>> Yep, that makes it clear. I think an unlogged batch of prepared
>> statements with one statement per PK tuple would be roughly equivalent? And
>> probably no more complex to generate in the client?
>>
>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
>> wrote:
>>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>> 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> 

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Sylvain Lebresne
On Thu, Feb 9, 2017 at 10:52 AM, Benjamin Roth 
wrote:

> Ok got it.
>
> But it's interesting that this is supported:
> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>
> This is technically mostly the same (Token awareness,
> coordination/routing, read performance, ...), right?
>

It is. That's what I meant by "there is something to be said for the
consistency of the CQL language in general". In other words, look for no
externally logical reason for this being unsupported, it's unsupported
simply due to how the CQL code evolved. But as I said, we didn't fix that
inconsistency because we're all busy and it's not really that important in
practice. The project of course welcome any contributions though :)


>
> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne :
>
>> This is a statement on multiple partitions and there is really no
>> optimization the code internally does on that. In fact, I strongly advise
>> you to not use a batch but rather simply do a for loop client side and send
>> statement individually. That way, your driver will be able to use proper
>> token-awareness for each request (while if you send a batch, one
>> coordinator will be picked up and will have to forward most statement,
>> doing more network hops at the end of the day). The only case where using a
>> batch is indeed legit is if you care about all the statement being atomic,
>> but in that case it's a logged batch you want.
>>
>> That's btw more or less why we never bothered implementing that: it's
>> totally doable technically, but it's not really such a good idea
>> performance wise in practice most of the time, and you can easily work it
>> around with a batch if you need atomicity.
>>
>> Which is not saying it will never be and shouldn't be supported btw,
>> there is something to be said for the consistency of the CQL language in
>> general. But it's why no-one took time to do it so far.
>>
>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
>> wrote:
>>
>>> Yes, thats the workaround - I'll try that.
>>>
>>> Would you agree it would be better for internal optimizations to process
>>> this within a single statement?
>>>
>>> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>>>
 Yep, that makes it clear. I think an unlogged batch of prepared
 statements with one statement per PK tuple would be roughly equivalent? And
 probably no more complex to generate in the client?

 On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
 wrote:

> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2),
> (1, 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
> wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty,
> it is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of 
> records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
 --
 
 Ben Slater
 Chief Product Officer
 Instaclustr: Cassandra + Spark - Managed | Consulting | Support
 +61 437 929 798 <+61%20437%20929%20798>

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG 

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
That’s a very good point from Sylvain that I forgot/missed. That said,
we’ve seen plenty of scenarios where overall system throughput is improved
through unlogged batches. One of my colleagues did quite a bit of
benchmarking on this topic for his talk at last year’s C* summit:
http://www.slideshare.net/DataStax/microbatching-highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016

On Thu, 9 Feb 2017 at 20:52 Benjamin Roth  wrote:

> Ok got it.
>
> But it's interesting that this is supported:
> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>
> This is technically mostly the same (Token awareness,
> coordination/routing, read performance, ...), right?
>
> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne :
>
> This is a statement on multiple partitions and there is really no
> optimization the code internally does on that. In fact, I strongly advise
> you to not use a batch but rather simply do a for loop client side and send
> statement individually. That way, your driver will be able to use proper
> token-awareness for each request (while if you send a batch, one
> coordinator will be picked up and will have to forward most statement,
> doing more network hops at the end of the day). The only case where using a
> batch is indeed legit is if you care about all the statement being atomic,
> but in that case it's a logged batch you want.
>
> That's btw more or less why we never bothered implementing that: it's
> totally doable technically, but it's not really such a good idea
> performance wise in practice most of the time, and you can easily work it
> around with a batch if you need atomicity.
>
> Which is not saying it will never be and shouldn't be supported btw, there
> is something to be said for the consistency of the CQL language in general.
> But it's why no-one took time to do it so far.
>
> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
> wrote:
>
> Yes, thats the workaround - I'll try that.
>
> Would you agree it would be better for internal optimizations to process
> this within a single statement?
>
> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>
> Yep, that makes it clear. I think an unlogged batch of prepared statements
> with one statement per PK tuple would be roughly equivalent? And probably
> no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth  wrote:
>
> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
> 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth  wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | 

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
Ok got it.

But it's interesting that this is supported:
DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));

This is technically mostly the same (Token awareness, coordination/routing,
read performance, ...), right?

2017-02-09 10:43 GMT+01:00 Sylvain Lebresne :

> This is a statement on multiple partitions and there is really no
> optimization the code internally does on that. In fact, I strongly advise
> you to not use a batch but rather simply do a for loop client side and send
> statement individually. That way, your driver will be able to use proper
> token-awareness for each request (while if you send a batch, one
> coordinator will be picked up and will have to forward most statement,
> doing more network hops at the end of the day). The only case where using a
> batch is indeed legit is if you care about all the statement being atomic,
> but in that case it's a logged batch you want.
>
> That's btw more or less why we never bothered implementing that: it's
> totally doable technically, but it's not really such a good idea
> performance wise in practice most of the time, and you can easily work it
> around with a batch if you need atomicity.
>
> Which is not saying it will never be and shouldn't be supported btw, there
> is something to be said for the consistency of the CQL language in general.
> But it's why no-one took time to do it so far.
>
> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
> wrote:
>
>> Yes, thats the workaround - I'll try that.
>>
>> Would you agree it would be better for internal optimizations to process
>> this within a single statement?
>>
>> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>>
>>> Yep, that makes it clear. I think an unlogged batch of prepared
>>> statements with one statement per PK tuple would be roughly equivalent? And
>>> probably no more complex to generate in the client?
>>>
>>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
>>> wrote:
>>>
 Maybe that makes it clear:

 DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
 3), (2, 3), (3, 4));

 If want to delete or select a bunch of records identified by their
 multi-partitionkey tuples.

 2017-02-09 10:18 GMT+01:00 Ben Slater :

 Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
 looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?

 Cheers
 Ben

 On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
 wrote:

 Hi Guys,

 CQL says this is not allowed:

 DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));

 1. Is there a reason for it? There shouldn't be a performance penalty,
 it is a PK lookup, the same thing works with a single pk column
 2. Is there a known workaround for it?

 It would be much of a help to have it for daily business, IMHO it's a
 waste of resources to run multiple queries just to fetch a bunch of records
 by a PK.

 Thanks in advance for any reply

 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 --
 
 Ben Slater
 Chief Product Officer
 Instaclustr: Cassandra + Spark - Managed | Consulting | Support
 +61 437 929 798 <+61%20437%20929%20798>




 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>> --
>>> 
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798 <+61%20437%20929%20798>
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Sylvain Lebresne
This is a statement on multiple partitions and there is really no
optimization the code internally does on that. In fact, I strongly advise
you to not use a batch but rather simply do a for loop client side and send
statement individually. That way, your driver will be able to use proper
token-awareness for each request (while if you send a batch, one
coordinator will be picked up and will have to forward most statement,
doing more network hops at the end of the day). The only case where using a
batch is indeed legit is if you care about all the statement being atomic,
but in that case it's a logged batch you want.

That's btw more or less why we never bothered implementing that: it's
totally doable technically, but it's not really such a good idea
performance wise in practice most of the time, and you can easily work it
around with a batch if you need atomicity.

Which is not saying it will never be and shouldn't be supported btw, there
is something to be said for the consistency of the CQL language in general.
But it's why no-one took time to do it so far.

On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
wrote:

> Yes, thats the workaround - I'll try that.
>
> Would you agree it would be better for internal optimizations to process
> this within a single statement?
>
> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>
>> Yep, that makes it clear. I think an unlogged batch of prepared
>> statements with one statement per PK tuple would be roughly equivalent? And
>> probably no more complex to generate in the client?
>>
>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
>> wrote:
>>
>>> Maybe that makes it clear:
>>>
>>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>>> 3), (2, 3), (3, 4));
>>>
>>> If want to delete or select a bunch of records identified by their
>>> multi-partitionkey tuples.
>>>
>>> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>>>
>>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>>
>>> Cheers
>>> Ben
>>>
>>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
>>> wrote:
>>>
>>> Hi Guys,
>>>
>>> CQL says this is not allowed:
>>>
>>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>>
>>> 1. Is there a reason for it? There shouldn't be a performance penalty,
>>> it is a PK lookup, the same thing works with a single pk column
>>> 2. Is there a known workaround for it?
>>>
>>> It would be much of a help to have it for daily business, IMHO it's a
>>> waste of resources to run multiple queries just to fetch a bunch of records
>>> by a PK.
>>>
>>> Thanks in advance for any reply
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>> --
>>> 
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798 <+61%20437%20929%20798>
>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
Yes, thats the workaround - I'll try that.

Would you agree it would be better for internal optimizations to process
this within a single statement?

2017-02-09 10:32 GMT+01:00 Ben Slater :

> Yep, that makes it clear. I think an unlogged batch of prepared statements
> with one statement per PK tuple would be roughly equivalent? And probably
> no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth  wrote:
>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>> 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
Yep, that makes it clear. I think an unlogged batch of prepared statements
with one statement per PK tuple would be roughly equivalent? And probably
no more complex to generate in the client?

On Thu, 9 Feb 2017 at 20:22 Benjamin Roth  wrote:

> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
> 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth  wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
Maybe that makes it clear:

DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1, 3),
(2, 3), (3, 4));

If want to delete or select a bunch of records identified by their
multi-partitionkey tuples.

2017-02-09 10:18 GMT+01:00 Ben Slater :

> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth  wrote:
>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Ben Slater
Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?

Cheers
Ben

On Thu, 9 Feb 2017 at 20:09 Benjamin Roth  wrote:

> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth
Hi Guys,

CQL says this is not allowed:

DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));

1. Is there a reason for it? There shouldn't be a performance penalty, it
is a PK lookup, the same thing works with a single pk column
2. Is there a known workaround for it?

It would be much of a help to have it for daily business, IMHO it's a waste
of resources to run multiple queries just to fetch a bunch of records by a
PK.

Thanks in advance for any reply

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Error when running nodetool cleanup after adding a new node to a cluster

2017-02-09 Thread Srinath Reddy
Alex,

Thanks for reply.  I will try the workaround and post an update.

Regards,

Srinath Reddy

> On 09-Feb-2017, at 1:44 PM, Oleksandr Shulgin  
> wrote:
> 
> On Thu, Feb 9, 2017 at 6:13 AM, Srinath Reddy  > wrote:
> Hi,
> 
> Trying to re-balacne a Cassandra cluster after adding a new node and I'm 
> getting this error when running nodetool cleanup. The Cassandra cluster is 
> running in a Kubernetes cluster.
> 
> Cassandra version is 2.2.8
> 
> nodetool cleanup
> error: io.k8s.cassandra.KubernetesSeedProvider
> Fatal configuration error; unable to start server.  See log for stacktrace.
> -- StackTrace --
> org.apache.cassandra.exceptions.ConfigurationException: 
> io.k8s.cassandra.KubernetesSeedProvider
> Fatal configuration error; unable to start server.  See log for stacktrace.
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:676)
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:119)
>   at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
>   at 
> org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(NodeProbe.java:262)
>   at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
>   at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
>   at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
> 
> Hi,
> 
> From the above stacktrace it looks like you're hitting the following TODO 
> item:
> 
> https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/tools/NodeProbe.java#L282
>  
> 
> 
> That is, nodetool needs to know concurrent_compactors setting's value before 
> starting cleanup, but doesn't use JMX and tries to parse the configuration 
> file instead.  That fails because your custom SeedProvider class is not on 
> classpath for nodetool.
> 
> A workaround: make sure io.k8s.cassandra.KubernetesSeedProvider can be found 
> by java when running nodetool script, see 
> https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/bin/nodetool#L108
>  
> 
> 
> Proper fix: get rid of the TODO and really query the value using JMX, 
> especially since the latest tick-tock release of Cassandra (3.10) added a way 
> to modify it with JMX.
> 
> --
> Alex



signature.asc
Description: Message signed with OpenPGP


Re: Error when running nodetool cleanup after adding a new node to a cluster

2017-02-09 Thread Oleksandr Shulgin
On Thu, Feb 9, 2017 at 6:13 AM, Srinath Reddy  wrote:

> Hi,
>
> Trying to re-balacne a Cassandra cluster after adding a new node and I'm
> getting this error when running nodetool cleanup. The Cassandra cluster
> is running in a Kubernetes cluster.
>
> Cassandra version is 2.2.8
>
> nodetool cleanup
> error: io.k8s.cassandra.KubernetesSeedProvider
> Fatal configuration error; unable to start server.  See log for stacktrace.
> -- StackTrace --
> org.apache.cassandra.exceptions.ConfigurationException: io.k8s.cassandra.
> KubernetesSeedProvider
> Fatal configuration error; unable to start server.  See log for stacktrace.
> at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(
> DatabaseDescriptor.java:676)
> at org.apache.cassandra.config.DatabaseDescriptor.(
> DatabaseDescriptor.java:119)
> at org.apache.cassandra.tools.NodeProbe.checkJobs(NodeProbe.java:256)
> at org.apache.cassandra.tools.NodeProbe.forceKeyspaceCleanup(
> NodeProbe.java:262)
> at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:55)
> at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:244)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:158)
>

Hi,

>From the above stacktrace it looks like you're hitting the following TODO
item:

https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/tools/NodeProbe.java#L282

That is, nodetool needs to know concurrent_compactors setting's value
before starting cleanup, but doesn't use JMX and tries to parse the
configuration file instead.  That fails because your custom SeedProvider
class is not on classpath for nodetool.

A workaround: make sure io.k8s.cassandra.KubernetesSeedProvider can be
found by java when running nodetool script, see
https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/bin/nodetool#L108

Proper fix: get rid of the TODO and really query the value using JMX,
especially since the latest tick-tock release of Cassandra (3.10) added a
way to modify it with JMX.

--
Alex