from:"Hannu Kröger"

Re: more nodes than vnodes

2022-06-15 Thread Hannu Kröger

Adding a token (which in essence is a vnode) means that the token range that it 
hits will be split into two. And that data range which has a new owner will be 
replicated to the new owner node. If there are a lot of tokens (=vnodes) in the 
cluster, adding some amount of vnodes (e.g. num_tokens=16) is going to affect 
that amount (e.g. 16) of existing ranges but if there are a lot of tokens, each 
range is relatively small and distributed across the cluster.

A very naive example:
Cluster has 100 nodes and 100GB data with replication factor=3 => 300GB data 
altogether. Each node will have ~3GB data. num_tokens is let’s say 256. In the 
cluster there would be 256*100 => 25600 tokens altogether.
You add one more node and let’s imagine that tokens are perfectly distributed, 
in the future each node will contain 2.97GB of data.

When that new node is joining, those 256 tokens are (hopefully) distributed 
evenly and each of those 100 nodes will replicate ~0.03GB of data to that new 
node so that it will eventually have that 2.97GB of data. And the cluster would 
have 25856 tokens after the scaling out operation. And only 256 existing token 
ranges would be changed, not all 25600 when a new node is joining.

So you see that for each node it’s only 30mb to replicate to the new node. Not 
very expensive, right?

In real life, it’s not so precise and all but the basic idea is the same.

Cheers,
Hannu

> On 15. Jun 2022, at 10.32, Luca Rondanini  wrote:
> 
> Thanks a lot Hannu,
> 
> really helpful! But isn't that crazy expensive? adding a vnode means that 
> every vnode in the cluster will have a different range of tokens which means 
> a lot of data will need to be moved around. 
> 
> Thanks again, 
> Luca
> 
> 
> 
> On Wed, Jun 15, 2022 at 12:25 AM Hannu Kröger  <mailto:hkro...@gmail.com>> wrote:
> When a node joins a cluster, it gets (semi-)random tokens based on num_tokens 
> value.
> 
> Total amount of vnodes is not fixed. I don’t remember top of my hat if 
> num_tokens can be different on each node but whenever you add a node, new 
> vnodes get “created”. Existing token ranges will be split and some range will 
> be allocated for the new node and data is being replicated to the joining 
> node. So if you have num_tokens set to a higher value like 16 or so, adding 
> and removing a single node in a cluster is standard operation and although it 
> causes some load on the cluster, it should be somewhat evenly distributed 
> among other nodes. If you have just a single token per node then scaling up 
> or down has a bit different effects due to balancing issues etc. So there is 
> a reason why default num_tokens is 16 currently.
> 
> Cheers,
> Hannu
> 
>> On 15. Jun 2022, at 10.12, Luca Rondanini > <mailto:luca.rondan...@gmail.com>> wrote:
>> 
>> ok, that makes sense, but does the partitioner add vnodes? is the number of 
>> vnodes fixed in a cluster?
>> 
>> On Wed, Jun 15, 2022 at 12:10 AM Hannu Kröger > <mailto:hkro...@gmail.com>> wrote:
>> Hey,
>> 
>> num_tokens is tokens per node.
>> 
>> So in your case you would have 15 vnodes altogether.
>> 
>> Cheers,
>> Hannu
>> 
>> > On 15. Jun 2022, at 10.08, Luca Rondanini > > <mailto:luca.rondan...@gmail.com>> wrote:
>> > 
>> > Hi all,
>> > 
>> > I'm just trying to understand better how cassandra works. 
>> > 
>> > My understanding is that, once set, the number of vnodes does not change 
>> > in a cluster. The partitioner allocates vnodes to nodes ensuring 
>> > replication data are not stored on the same node.
>> > 
>> > But what happens if there are more nodes than vnodes? If I set num_tokens 
>> > to 3 and I have 5 servers? Unless the partitioner adds vnodes and moves 
>> > data around but it seems an extremely expensive operation. I'm sure I'm 
>> > missing something, I'm not quite sure what! :)
>> > 
>> > Thanks,
>> > Luca
>> > 
>> 
>

Re: more nodes than vnodes

2022-06-15 Thread Hannu Kröger

When a node joins a cluster, it gets (semi-)random tokens based on num_tokens 
value.

Total amount of vnodes is not fixed. I don’t remember top of my hat if 
num_tokens can be different on each node but whenever you add a node, new 
vnodes get “created”. Existing token ranges will be split and some range will 
be allocated for the new node and data is being replicated to the joining node. 
So if you have num_tokens set to a higher value like 16 or so, adding and 
removing a single node in a cluster is standard operation and although it 
causes some load on the cluster, it should be somewhat evenly distributed among 
other nodes. If you have just a single token per node then scaling up or down 
has a bit different effects due to balancing issues etc. So there is a reason 
why default num_tokens is 16 currently.

Cheers,
Hannu

> On 15. Jun 2022, at 10.12, Luca Rondanini  wrote:
> 
> ok, that makes sense, but does the partitioner add vnodes? is the number of 
> vnodes fixed in a cluster?
> 
> On Wed, Jun 15, 2022 at 12:10 AM Hannu Kröger  <mailto:hkro...@gmail.com>> wrote:
> Hey,
> 
> num_tokens is tokens per node.
> 
> So in your case you would have 15 vnodes altogether.
> 
> Cheers,
> Hannu
> 
> > On 15. Jun 2022, at 10.08, Luca Rondanini  > <mailto:luca.rondan...@gmail.com>> wrote:
> > 
> > Hi all,
> > 
> > I'm just trying to understand better how cassandra works. 
> > 
> > My understanding is that, once set, the number of vnodes does not change in 
> > a cluster. The partitioner allocates vnodes to nodes ensuring replication 
> > data are not stored on the same node.
> > 
> > But what happens if there are more nodes than vnodes? If I set num_tokens 
> > to 3 and I have 5 servers? Unless the partitioner adds vnodes and moves 
> > data around but it seems an extremely expensive operation. I'm sure I'm 
> > missing something, I'm not quite sure what! :)
> > 
> > Thanks,
> > Luca
> > 
>

Re: more nodes than vnodes

2022-06-15 Thread Hannu Kröger

Hey,

num_tokens is tokens per node.

So in your case you would have 15 vnodes altogether.

Cheers,
Hannu

> On 15. Jun 2022, at 10.08, Luca Rondanini  wrote:
> 
> Hi all,
> 
> I'm just trying to understand better how cassandra works. 
> 
> My understanding is that, once set, the number of vnodes does not change in a 
> cluster. The partitioner allocates vnodes to nodes ensuring replication data 
> are not stored on the same node.
> 
> But what happens if there are more nodes than vnodes? If I set num_tokens to 
> 3 and I have 5 servers? Unless the partitioner adds vnodes and moves data 
> around but it seems an extremely expensive operation. I'm sure I'm missing 
> something, I'm not quite sure what! :)
> 
> Thanks,
> Luca
>

Re: Data not persisted in Cassandra docker

2020-03-09 Thread Hannu Kröger

You need to mount volumes from the host system or docker volumes to container 
to have data persisted.

See section "Where to Store Data” in https://hub.docker.com/_/cassandra 


Hannu

> On 9. Mar 2020, at 11.25, Valentina Ivanova  wrote:
> 
> Hello!
> 
> I am using Cassandra 3.11.5 from docker. I created a keyspace and a table in 
> it and inserted some data into the table. Before stopping the container I 
> executed nodetool flush to persist the data. However, upon starting the 
> container after the weekend, the keyspace, table and data were not existent. 
> What shall I do in order to persist the data?
> 
> Many thanks & have a great week!
> 
> Valentina

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Hannu Kröger

It means that you are using 5-10GB of memory just to hold information about 
tables. Memtables hold the data that is written to the database until those are 
flushed to the disk, and those happen when memory is low or some other 
threshold is reached.

Every table will have a memtable that takes at least 1MB memory.

E.g. if you need 8GB heap to run a system with 1 table, you would need 18GB 
heap to have 1 tables.

Hannu

> On 29. Jan 2020, at 16.03, Behroz Sikander  wrote:
> 
> It doesn't seem to be the problem but I do not have deep knowledge of C* 
> internals.
> 
> When do memtable come into play? Only at startup?
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Hannu Kröger

IIRC there is an overhead of about 1MB per table which you have about 
5000-1 => 5GB - 10GB overhead of just having that many tables. To me it 
looks like that you need to increase the heap size and later potentially work 
on the data models to have less tables.

Hannu

> On 29. Jan 2020, at 15.50, Behroz Sikander  wrote:
> 
>>> Some environment details like Cassandra version, amount of physical RAM, 
> JVM configs (heap and others), and any other non-default cassandra.yaaml 
> configs would help. The amount of data, number of keyspaces & tables, 
> since you mention "clients", would also be helpful for people to suggest 
> tuning improvements.
> 
> We are more or less using the default properties.
> Here are some more details
> 
> - Total nodes in the cluster - 9
> - Disk for each node is 2 TB
> - Number of keyspaces - 1000
> - Each keyspace has 5-10 tables
> - We observed this problem on a c4.4xlarge (AWS EC2) instance having 30GB RAM 
> with 8GB heap
> - We observed the same problem on a c4.8xlarge having 60GB RAM with 12GB heap
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [E] bug in cluster key push down

2020-01-13 Thread Hannu Kröger

No, I think it was originally correct.

If partition key has multiple parts, then you need parenthesis around parts of 
partition key.

Hannu

> On 13. Jan 2020, at 14.30, Saha, Sushanta K 
>  wrote:
> 
>> primary key (partition, clustering1, clustering2)
>> 
>> So, the partitioning key has three columns. You need to specify values for 
>> all three columns. For clustering columns, you need another parenthesis like 
>> primary key (partition, (clustering1, clustering2))
>> 
>>  Sushanta
> 
> On Sun, Jan 12, 2020 at 10:52 AM Jeff Jirsa  > wrote:
> Can you open a jira so someone can investigate ? It’s probably just a logging 
> / visibility problem, but we should confirm 
> 
> Sent from my iPhone
> 
>> On Jan 12, 2020, at 6:04 AM, onmstester onmstester 
>>  wrote:
>> 
>> 
>> Using Apache Cassandra 3.11.2, defined a table like this:
>> 
>> create table my_table(
>>partition text,
>>clustering1 int,
>>   clustering2 text,
>>   data set,
>> primary key (partition, clustering1, clustering2))
>> 
>> and configured slow queries threshold to 1ms in yaml to see how queries 
>> passed to cassandra. Query below:
>> 
>> select * from my_table where partition='a' and clustering1= 1 and 
>> clustering2='b'
>> 
>> would be like this in debug.log of cassandra:
>> 
>> select * from my_table where partition='a' LIMIT 100>  (it means that the 
>> two cluster key restriction did not push down to storage engine and the 
>> whole partition been retrieved)
>> 
>> but this query:
>> 
>> select * from my_table where partition='a' and clustering1= 1
>> 
>> would be 
>> 
>> select * from my_table where partition='a' and clustering1= 1 LIMIT 100> 
>> (single cluster key been pushed down to storage engine)
>> 
>> 
>> So it seems to me that, we could not restrict multiple clustering keys in 
>> select because it would retrieve the whole partition ?!
>> Sent using Zoho Mail 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> Sushanta Saha|MTS IV-Cslt-Sys Engrg|WebIaaS_DB Group|HQ - VerizonWireless 
> O 770.797.1260  C 770.714.6555 Iaas Support Line 949-286-8810
>

Re: Exact use case for CustomPayloads in v4 protocol version

2020-01-10 Thread Hannu Kröger

For example using it to pass distributed tracing token to Cassandra which can 
then later be used to track operations end to end across the whole stack from 
api entry point to Cassandra query traces. 

This explains how it is done: 
https://thelastpickle.com/blog/2015/12/07/using-zipkin-for-full-stack-tracing-including-cassandra.html

Cheers,
Hannu

> Goutham reddy  kirjoitti 10.1.2020 kello 22.11:
> 
> 
> Hello all,
> I was trying to explore more about the custom payloads which was introduced 
> in protocol version v4. I could not get the actual use case using custom 
> payload. Can somebody shed some light on this? Appreciate your help:)
> 
> Thanks and regards,
> Goutham

Re: Securing cluster communication

2019-06-28 Thread Hannu Kröger

I would start checking this page: 
http://cassandra.apache.org/doc/latest/operating/security.html

Then move to this:
https://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

Cheers,
Hannu

> Marc Richter  kirjoitti 28.6.2019 kello 16.55:
> 
> Hi everyone,
> 
> I'm completely new to Cassandra DB, so please do not roast me for asking 
> obvious stuff.
> 
> I managed to setup one Cassandra node and enter some data to it, 
> successfully. Next, I installed a second node, which connects to that first 
> one via port 7000 and sync all that data from it. This worked fine as well.
> 
> But doing so, it leaves me puzzled a bit because of the security aspect of 
> this: Neither did I need to authenticate to the seeding (first) node, nor did 
> I find a resource which describes how to secure that cluster communication by 
> implementing some kind of authentication, which prevents everyone on the same 
> net to connect to the nodes.
> 
> How is this dealt with in Cassandra? Is setting up firewalls the only way to 
> allow only some nodes to connect to the ports 7000/7001?
> 
> BR,
> Marc
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Disable Truststore CA check for internode_encryption

2019-02-27 Thread Hannu Kröger

I was using this as reference: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__SecurityProps

And there I see “require client authentication” also in server options ie. 
internode encryption.

However I am not sure if this is what the OP is after. 

Hannu

> Jeff Jirsa  kirjoitti 28.2.2019 kello 9.01:
> 
> That’s client to server - internode is different
> 
> Don’t think it’s possible without code modifications - please opens JIRA
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Feb 27, 2019, at 10:21 PM, Hannu Kröger  wrote:
>> 
>> Is server encryption option ”require_client_auth: false” what you are after?
>> 
>> Hannu
>> 
>>> Jai Bheemsen Rao Dhanwada  kirjoitti 28.2.2019 kello 
>>> 1.57:
>>> 
>>> Hello,
>>> 
>>> Is it possible to disable truststore CA check for the cassandra 
>>> internode_encyrption? if yes, is there a config property to do that?
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Disable Truststore CA check for internode_encryption

2019-02-27 Thread Hannu Kröger

Is server encryption option ”require_client_auth: false” what you are after?

Hannu

> Jai Bheemsen Rao Dhanwada  kirjoitti 28.2.2019 kello 
> 1.57:
> 
> Hello,
> 
> Is it possible to disable truststore CA check for the cassandra 
> internode_encyrption? if yes, is there a config property to do that?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Restore a table with dropped columns to a new cluster fails

2019-02-19 Thread Hannu Kröger

Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 
<https://issues.apache.org/jira/browse/CASSANDRA-14336>

Basically if a table contains dropped columns and you try to restore a snapshot 
to a new cluster, that will fail because of an error like 
"java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of 
Cassandra. You cannot restore a backup to a new cluster if columns have been 
dropped.

There have been other similar tickets that have been apparently closed but 
based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger

Potential bootstrap failure bugs

2019-01-30 Thread Hannu Kröger

Hi,

I have tested Cassandra 3.0.10 and 3.0.17 and I have found three potential bugs 
there.

The test steps with CCM are as follows:
1) Start 3 node cluster
2) Create enough data with cassandra-stress
3) Kill node3
4) Create + Start node4  with replace_address=node3
5) Wait for streaming to start (Check node4 logs for "Prepare completed. 
Receiving N files”)
6) While node is streaming, drop keyspace1.standard1

Based on my findinds what happens after this are following not-so-great things:
1) Bootstrapping fails (which really shouldn’t because schema changes in bigger 
always online multi user clusters can happen anytime, same with bootstrapping 
new nodes / replacing old nodes)
2) At least in my test scenario bootstrap fails / halts and is left in UJ state 
BUT CQL port is opened and clients can connect normally which is weird. At the 
same time it is both joining and it allows CQL connections.
3) If I run “bootstrap resume” twice after failure or once before bootstrap has 
failed, nodetool is reporting that bootstrap is already done (same goes to 
logs). Which is not true. Bootstrap is ongoing. 

What do you think, should I open tickets about these?

Best regards,
Hannu Kröger
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Scale SASI index

2018-09-18 Thread Hannu Kröger

You shouldn’t need to. You just scale up and run ”nodetool cleanup” and that 
will take care of it. 

Hannu

> onmstester onmstester  kirjoitti 18.9.2018 kello 8.52:
> 
> By adding new nodes to cluster, should i rebuild SASI indexes on all nodes ?
> 
>

Re: [EXTERNAL] full text search on some text columns

2018-08-01 Thread Hannu Kröger

Does someone know if you can do online upgrade of elassandra? With lucene 
plugin you cannot really because you need to drop and recreate indexes if 
lucene has been updated. 

Hannu

> Octavian Rinciog  kirjoitti 1.8.2018 kello 12.49:
> 
> Hello!
> 
> Maybe this will work? https://github.com/strapdata/elassandra (I haven't 
> tested this plugin)
> 
> 2018-08-01 12:17 GMT+03:00 Hannu Kröger :
>> 3.11.1 plugin works with 3.11.2. But yes, original maintainer is not 
>> maintaining the project anymore. At least not actively. 
>> 
>> Hannu
>> 
>>> Ben Slater  kirjoitti 1.8.2018 kello 7.16:
>>> 
>>> We (Instaclustr) will be submitting a PR for 3.11.3 support for 
>>> cassandra-lucene-index once 3.11.3 is officially released as we offer it as 
>>> part of our service and have customers using it.
>>> 
>>> Cheers
>>> Ben
>>> 
>>>> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester  
>>>> wrote:
>>>> It seems to be an interesting project but sort of abandoned. No update in 
>>>> last 8 Months and not supporting Cassandra 3.11.2  (the version i 
>>>> currently use)
>>>> 
>>>> Sent using Zoho Mail
>>>> 
>>>> 
>>>> 
>>>>  Forwarded message 
>>>> From : Andrzej Śliwiński 
>>>> To : 
>>>> Date : Wed, 01 Aug 2018 08:16:06 +0430
>>>> Subject : Re: [EXTERNAL] full text search on some text columns
>>>>  Forwarded message 
>>>> 
>>>> Maybe this plugin could do the job: 
>>>> https://github.com/Stratio/cassandra-lucene-index
>>>> 
>>>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester  
>>>> wrote:
>>>> 
>>>> 
>>> -- 
>>> Ben Slater
>>> Chief Product Officer
>>> 
>>> 
>>> Read our latest technical blog posts here.
>>> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
>>> and Instaclustr Inc (USA).
>>> This email and any attachments may contain confidential and legally 
>>> privileged information.  If you are not the intended recipient, do not copy 
>>> or disclose its content, but please reply to this email immediately and 
>>> highlight the error to the sender and then immediately delete the message.
> 
> 
> 
> -- 
> Octavian Rinciog

Re: [EXTERNAL] full text search on some text columns

2018-08-01 Thread Hannu Kröger

3.11.1 plugin works with 3.11.2. But yes, original maintainer is not 
maintaining the project anymore. At least not actively. 

Hannu

> Ben Slater  kirjoitti 1.8.2018 kello 7.16:
> 
> We (Instaclustr) will be submitting a PR for 3.11.3 support for 
> cassandra-lucene-index once 3.11.3 is officially released as we offer it as 
> part of our service and have customers using it.
> 
> Cheers
> Ben
> 
>> On Wed, 1 Aug 2018 at 14:06 onmstester onmstester  
>> wrote:
>> It seems to be an interesting project but sort of abandoned. No update in 
>> last 8 Months and not supporting Cassandra 3.11.2  (the version i currently 
>> use)
>> 
>> Sent using Zoho Mail
>> 
>> 
>> 
>>  Forwarded message 
>> From : Andrzej Śliwiński 
>> To : 
>> Date : Wed, 01 Aug 2018 08:16:06 +0430
>> Subject : Re: [EXTERNAL] full text search on some text columns
>>  Forwarded message 
>> 
>> Maybe this plugin could do the job: 
>> https://github.com/Stratio/cassandra-lucene-index
>> 
>> On Tue, 31 Jul 2018 at 22:37, onmstester onmstester  
>> wrote:
>> 
>> 
> -- 
> Ben Slater
> Chief Product Officer
> 
> 
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.

Re: Reading cardinality from Statistics.db failed

2018-07-25 Thread Hannu Kröger

What version of Cassandra are you running? There is a bug in 3.10.0 and certain 
3.0.x that occurs in certain conditions and corrupts that file. 

Hannu

> Vitali Dyachuk  kirjoitti 25.7.2018 kello 10.48:
> 
> Hi,
> I have noticed in the cassandra system.log that there is some issue with 
> sstable metadata, the messages says:
> WARN  [Thread-6] 2018-07-25 07:12:47,928 SSTableReader.java:249 - Reading 
> cardinality from Statistics.db failed for 
> /opt/data/disk5/data/keyspace/table/mc-big-Data.db
> Although there is no such file. The message has appeared after i've changed 
> the compaction strategy from SizeTiered to Leveled.
> Currently i'm running nodetool scrub to rebuilt the sstable, and it takes a 
> lot of time to scrub all sstables.
> Reading the code it is said that if this metada is broken, then estimating 
> the keys will be done using index summary. How expensive it is ?
> https://github.com/apache/cassandra/blob/cassandra-3.0.15/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L245
> 
> The main question is why has this happened?
> 
> Thanks,
> Vitali Djatsuk.

Re: Compaction out of memory

2018-07-12 Thread Hannu Kröger

Could the problem be that the process ran out of file handles? Recommendation 
is to tune that higher than the default. 

Hannu

> onmstester onmstester  kirjoitti 12.7.2018 kello 12.44:
> 
> Cassandra crashed in Two out of 10 nodes in my cluster within 1 day, the 
> error is:
> 
> ERROR [CompactionExecutor:3389] 2018-07-10 11:27:58,857 
> CassandraDaemon.java:228 - Exception in thread 
> Thread[CompactionExecutor:3389,1,main]
> org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
> at 
> org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:73) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:61) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:290)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:179)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:65)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:142)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:201)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:275)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_65]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_65]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:939) 
> ~[na:1.8.0_65]
> at 
> org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> ... 23 common frames omitted
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[na:1.8.0_65]
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:936) 
> ~[na:1.8.0_65]
> ... 24 common frames omitted
> 
> Each node has 128 GB ram which 32 GB allocated as Cassandra Heap. 
> Sent using Zoho Mail
> 
> 
>

Re: rebuild on running node

2018-07-05 Thread Hannu Kröger

You have just some extra data on those machines where you ran rebuild. 
Compaction will eventually take care of that. 

Nothing really harmful if you have the disk space available. 

Hannu

> Randy Lynn  kirjoitti 5.7.2018 kello 19.19:
> 
> Anyone ever make stupid mistakes? :)
> 
> TL/DR: I ran rebuild on a node that is already up and running in an existing 
> data center.. what happens?
> 
> This is what I did...
> Assume I have DC_syndey and adding DC_sydney_new
> But also have a DC_us..
> 
> from a node in DC_sydney_new I intended to type
> "rebuild -- DC_sydney"
> 
> What I actually did was from one of the nodes in DC_us I typed
> "rebuild -- DC_sydney
> 
> I cancelled the process 
> The node in US is showing a increased "load" right now..
> 
> What are the consequences?
> Should I run rebuild again?
> should I run repair?
> Or is all OK?
> 
> 
> 
> -- 
> Randy Lynn 
> rl...@getavail.com 
> 
> office: 
> 859.963.1616 ext 202 
> 163 East Main Street - Lexington, KY 40507 - USA 
> 
>   getavail.com

Re: Problem with dropped mutations

2018-07-02 Thread Hannu Kröger

Yes, there are timeouts sometimes but more on the read side. And yes, there are 
certain data modeling problems which will be soon addressed but we need to keep 
things steady before we get there. 

I guess many write timeouts go unnoticed due to consistency level != ALL. 

Network looks to be working fine. 

Hannu

> ZAIDI, ASAD A  kirjoitti 26.6.2018 kello 21.42:
> 
> Are you also seeing time-outs on certain Cassandra operations?? If yes, you 
> may have to tweak *request_timeout parameter in order to get rid of dropped 
> mutation messages if application data model is not upto mark!
> 
> You can also check if network isn't dropping packets (ifconfig  -a tool) +  
> storage (dstat tool) isn't reporting too slow disks.
> 
> Cheers/Asad
> 
> 
> -----Original Message-
> From: Hannu Kröger [mailto:hkro...@gmail.com] 
> Sent: Tuesday, June 26, 2018 9:49 AM
> To: user 
> Subject: Problem with dropped mutations
> 
> Hello,
> 
> We have a cluster with somewhat heavy load and we are seeing dropped 
> mutations (variable amount and not all nodes have those).
> 
> Are there some clear trigger which cause those? What would be the best 
> pragmatic approach to start debugging those? We have already added more 
> memory which seemed to help somewhat but not completely.
> 
> Cheers,
> Hannu
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Problem with dropped mutations

2018-06-26 Thread Hannu Kröger

Hello,

We have a cluster with somewhat heavy load and we are seeing dropped mutations 
(variable amount and not all nodes have those).

Are there some clear trigger which cause those? What would be the best 
pragmatic approach to start debugging those? We have already added more memory 
which seemed to help somewhat but not completely.

Cheers,
Hannu



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Client ID logging

2018-05-21 Thread Hannu Kröger

Hmm, I think that by default not but you can create a hook to log that. Create 
a wrapper for PasswordAuthenticator class for example and use that. Or if you 
don’t use authentication you can create your own query handler.

Hannu

> James Lovato  kirjoitti 21.5.2018 kello 21.37:
> 
> Hi guys,
>  
> Can standard OSS Cassandra 3 do logging of who connects to it?  We have a 
> cluster in 3 DCs and our devs want to see if the client is crossing across DC 
> (even though they have DCLOCAL set from their DS driver).
>  
> Thanks,
> James

Re: Error after 3.1.0 to 3.11.2 upgrade

2018-05-11 Thread Hannu Kröger

Hi,

Did you check replication strategy and amounts of replicas of system_auth 
keyspace?

Hannu

> Abdul Patel  kirjoitti 12.5.2018 kello 5.21:
> 
> No applicatiom isnt impacted ..no complains ..
> Also its an 4 node cluster in lower non production and all are on same 
> version.
> 
>> On Friday, May 11, 2018, Jeff Jirsa  wrote:
>> The read is timing out - is the cluster healthy? Is it fully upgraded or 
>> mixed versions? Repeated isn’t great, but is the application impacted? 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On May 12, 2018, at 6:17 AM, Abdul Patel  wrote:
>>> 
>>> Seems its coming from 3.10, got bunch of them today for 3.11.2, so if this 
>>> is repeatedly coming , whats solution for this?
>>> 
>>> WARN  [Native-Transport-Requests-24] 2018-05-11 16:46:20,938 
>>> CassandraAuthorizer.java:96 - CassandraAuthorizer failed to authorize 
>>> # for 
>>> ERROR [Native-Transport-Requests-24] 2018-05-11 16:46:20,940 
>>> ErrorMessage.java:384 - Unexpected exception during request
>>> com.google.common.util.concurrent.UncheckedExecutionException: 
>>> java.lang.RuntimeException: 
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
>>> received only 0 responses.
>>> at 
>>> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
>>> ~[guava-18.0.jar:na]
>>> at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
>>> ~[guava-18.0.jar:na]
>>> at 
>>> com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
>>> ~[guava-18.0.jar:na]
>>> at 
>>> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>>>  ~[guava-18.0.jar:na]
>>> at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
>>> ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.service.ClientState.authorize(ClientState.java:439) 
>>> ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:368)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:345)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:332) 
>>> ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:310)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:260)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:221)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> at 
>>> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507)
>>>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>>> 
 On Fri, May 11, 2018 at 8:30 PM, Jeff Jirsa  wrote:
 That looks like Cassandra 3.10 not 3.11.2
 
 It’s also just the auth cache failing to refresh - if it’s transient it’s 
 probably not a big deal. If it continues then there may be an issue with 
 the cache refresher.
 
 -- 
 Jeff Jirsa
 
 
> On May 12, 2018, at 5:55 AM, Abdul Patel  wrote:
> 
> HI All,
> 
> Seen below stack trace messages , in errorlog  one day after upgrade.
> one of the blogs said this might be due to old drivers, but not sure on 
> it.
> 
> FYI :
> 
> INFO  [HANDSHAKE-/10.152.205.150] 2018-05-09 10:22:27,160 
> OutboundTcpConnection.java:510 - Handshaking version with /10.152.205.150
> DEBUG [MessagingService-Outgoing-/10.152.205.150-Gossip] 2018-05-09 
> 10:22:27,160 OutboundTcpConnection.java:482 - Done connecting to 
> /10.152.205.150
> ERROR [Native-Transport-Requests-1] 2018-05-09 10:22:29,971 
> ErrorMessage.java:384 - Unexpected exception during request
> com.google.common.util.concurrent.UncheckedExecutionException: 
> com.google.common.util.concurrent.UncheckedExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.UnavailableException: Cannot achieve 
> consistency level LOCAL_ONE
> at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
> ~[guava-18.0.jar:na]
> at

Re: Does LOCAL_ONE still replicate data?

2018-05-08 Thread Hannu Kröger

Writes are always replicated to all nodes (if they are online).

LOCAL_ONE in writes just means that client will get an “OK” for the write only 
after at least node in local datacenter has acknowledged that the write is done.

If all local replicas are offline, then the write will fail even if it gets 
written in your other DC.

Hannu

> On 8 May 2018, at 13:24, Jakub Lida  wrote:
> 
> Hi,
> 
> I want to add a new DC to an existing cluster (RF=1 per DC).
> Will setting consistency to LOCAL_ONE on all machines make it still replicate 
> write requests sent to online DCs to all DCs (including the new one being 
> rebuilt) and only isolate read requests from reaching the new DC? That is 
> basically want I want to accomplish.
> 
> Thanks in advance, Jakub

Re: upgrade from 3.9 to 3.11.2

2018-05-03 Thread Hannu Kröger

Hi,

It depends on your replication factor and consistency levels used.

If you are not using consistency level of ALL in your applications and your 
replication factor is 3, then you usually don’t need to stop your frontend 
applications for the upgrade.

If replication factor = 2 then you need to run consistency level of one (or 
local_one or any) in your applications. You cannot use quorum or all, or your 
queries start failing.

If replication factor = 1, your queries start failing when you upgrade one node.

Hannu

> On 3 May 2018, at 10:12, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:
> 
> Thanks Hannu,
> Another question is that,do I need to stop the frontend application during 
> the upgrade?
> I have 3 nodes cluster,let’s say:
> Cassandra01
> Cassandra02
> Cassandra03
>  
> First I upgrade the cassandra01 node,
> 1,nodetool drain
> 2,backup data
> 3,install new binary
> 4,configure the configuration file
> 5,start Cassandra service
> 6,nodetool upgrade
>  
> Then another node one by one
>  
> do I need to stop the frontend application during the upgrade?
>  
>  
>  
> Best Regards,
>  
> 倪项菲/ David Ni
> 中移德电网络科技有限公司
> Virtue Intelligent Network Ltd, co.
> 
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>  
> 发件人: Hannu Kröger <hkro...@gmail.com> 
> 发送时间: 2018年5月3日 15:00
> 收件人: user <user@cassandra.apache.org>
> 主题: Re: upgrade from 3.9 to 3.11.2
>  
> Hello,
>  
> it never hurts to run “nodetool upgradesstables" after the upgrade. It’s a 
> no-op if there is nothing to upgrade.
>  
> Hannu
> 
> 
> On 3 May 2018, at 09:57, Xiangfei Ni <xiangfei...@cm-dt.com 
> <mailto:xiangfei...@cm-dt.com>> wrote:
>  
> Hi Community
>   I have a question regarding upgrading Cassandra from 3.9 to 3.11.2,
>   Do I need to run nodetool upgradesstables when I do the upgrade?we know 
> that we don’t need to run this command when we do minor version upgrade.But 
> from 3.9 to 3.11.2,I have no idea.
>   Also I suggest that the community should have official article about the 
> upgrading,everytime I do upgrade we can just google the posts via internet.
>  
>  
> Best Regards,
>  
> 倪项菲/ David Ni
> 中移德电网络科技有限公司
> Virtue Intelligent Network Ltd, co.
> 
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516

Re: upgrade from 3.9 to 3.11.2

2018-05-03 Thread Hannu Kröger

Hello,

it never hurts to run “nodetool upgradesstables" after the upgrade. It’s a 
no-op if there is nothing to upgrade.

Hannu

> On 3 May 2018, at 09:57, Xiangfei Ni  wrote:
> 
> Hi Community
>   I have a question regarding upgrading Cassandra from 3.9 to 3.11.2,
>   Do I need to run nodetool upgradesstables when I do the upgrade?we know 
> that we don’t need to run this command when we do minor version upgrade.But 
> from 3.9 to 3.11.2,I have no idea.
>   Also I suggest that the community should have official article about the 
> upgrading,everytime I do upgrade we can just google the posts via internet.
>  
>  
> Best Regards,
>  
> 倪项菲/ David Ni
> 中移德电网络科技有限公司
> Virtue Intelligent Network Ltd, co.
> 
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516

Re: GUI clients for Cassandra

2018-05-02 Thread Hannu Kröger

Ah, you are correct! 

However, it’s not being updated anymore AFAIK. Do you know if it support the 
latest 3.x features? SASI, MV, etc. ?

Hannu

> On 24 Apr 2018, at 03:45, Christophe Schmitz  
> wrote:
> 
> Hi Hannu ;)
> 
>  
> 
>>> I have been asked many times that what is a good GUI client for Cassandra. 
>>> DevCenter is not available anymore and DataStax has a DevStudio but that’s 
>>> for DSE only.
> 
> 
>  DevCenter is still available, I just downloaded it.
> 
> Cheers,
> Christophe
> 
> 
> 
> -- 
> Christophe Schmitz - VP Consulting
> AU: +61 4 03751980 / FR: +33 7 82022899
>   
>    
> 
> Read our latest technical blog posts here 
> . This email has been sent on behalf of 
> Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email 
> and any attachments may contain confidential and legally privileged 
> information.  If you are not the intended recipient, do not copy or disclose 
> its content, but please reply to this email immediately and highlight the 
> error to the sender and then immediately delete the message.

GUI clients for Cassandra

2018-04-22 Thread Hannu Kröger

Hello everyone!

I have been asked many times that what is a good GUI client for Cassandra. 
DevCenter is not available anymore and DataStax has a DevStudio but that’s for 
DSE only.

Are there some 3rd party GUI tools that you are using a lot? I always use the 
command line client myself. I have tried to look for some Cassandra related 
tools but I haven’t found any good one yet.

Cheers,
Hannu
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Nodetool Repair --full

2018-03-17 Thread Hannu Kröger

Hi Jonathan,

If you want to repair just one node (for example if it has been down for more 
than 3h), run “nodetool repair -full” on that node. This will bring all data on 
that node up to date.

If you want to repair all data on the cluster, run “nodetool repair -full -pr” 
on each node. This will run full repair on all nodes but it will do it so only 
the primary range for each node is fixed. If you do it on all nodes, 
effectively the whole token range is repaired. You can run the same without -pr 
to get the same effect but it’s not efficient because then you are doing the 
repair RF times on all data instead of just repairing the whole data once.

I hope this clarifies,
Hannu

> On 17 Mar 2018, at 17:20, Jonathan Baynes  
> wrote:
> 
> Hi Community,
>  
> Can someone confirm, as the documentation out on the web is so contradictory 
> and vague.
>  
> Nodetool repair –full if I call this, do I need to run this on ALL my nodes 
> or is just the once sufficient?
>  
> Thanks
> J
>  
> Jonathan Baynes
> DBA
> Tradeweb Europe Limited
> Moor Place  •  1 Fore Street Avenue  •  London EC2Y 9DT
> P +44 (0)20 77760988  •  F +44 (0)20 7776 3201  •  M +44 (0)7884111546
> jonathan.bay...@tradeweb.com 
>  
>     follow us:   
> 
> 
> —
> A leading marketplace  for 
> electronic fixed income, derivatives and ETF trading
>  
> 
> 
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient (or have received this e-mail in error) please 
> notify the sender immediately and destroy it. Any unauthorized copying, 
> disclosure or distribution of the material in this e-mail is strictly 
> forbidden. Tradeweb reserves the right to monitor all e-mail communications 
> through its networks. If you do not wish to receive marketing emails about 
> our products / services, please let us know by contacting us, either by email 
> at contac...@tradeweb.com  or by writing to us 
> at the registered office of Tradeweb in the UK, which is: Tradeweb Europe 
> Limited (company number 3912826), 1 Fore Street Avenue London EC2Y 9DT. To 
> see our privacy policy, visit our website @ www.tradeweb.com 
> .
>

Re: What versions should the documentation support now?

2018-03-12 Thread Hannu Kröger

In my opinion, a good documentation should somehow include version specific 
pieces of information. Whether it is nodetool command that came in certain 
version or parameter for something or something else.

That would very useful. It’s confusing if I see documentation talking about 4.0 
specifics and I try to find that in my 3.11.x

Hannu

> On 12 Mar 2018, at 16:38, Kenneth Brotman  
> wrote:
> 
> I’m unclear what versions are most popular right now? What version are you 
> running?
>  
> What version should still be supported in the documentation?  For example, 
> I’m turning my attention back to writing a section on adding a data center.  
> What versions should I support in that information?
>  
> I’m working on it right now.  Thanks,
>  
> Kenneth Brotman

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

> On 12 Mar 2018, at 14:45, Rahul Singh  wrote:
> 
> I may be wrong, but what I’ve read and used in the past assumes that the 
> “first” N rows are cached and the clustering key design is how I change what 
> N rows are put into memory. Looking at the code, it seems that’s the case. 

So we agree that we row cache is storing only N rows from the beginning of the 
partition. So if only the last row in a partition is read, then it probably 
doesn’t get cached assuming there are more than N rows in a partition?

> The language of the comment basically says that it holds in cache what 
> satisfies the query if and only if it’s the head of the partition, if not it 
> fetches it and saves it - I dont interpret it differently from what I have 
> seen in the documentation. 

Hmm, I’m trying to understand this. Does it mean that it stores the results in 
cache if it is head and if not, it will fetch the head and store that (instead 
of the results for the query) ?

Hannu

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

Hi,

My goal is to make sure that I understand functionality correctly and that the 
documentation is accurate. 

The question in other words: Is the documentation or the comment in the code 
wrong (or inaccurate).

Hannu

> On 12 Mar 2018, at 13:00, Rahul Singh <rahul.xavier.si...@gmail.com> wrote:
> 
> What’s the goal? How big are your partitions , size in MB and in rows?
> 
> --
> Rahul Singh
> rahul.si...@anant.us
> 
> Anant Corporation
> 
> On Mar 12, 2018, 6:37 AM -0400, Hannu Kröger <hkro...@gmail.com>, wrote:
>> Anyone?
>> 
>>> On 4 Mar 2018, at 20:45, Hannu Kröger <hkro...@gmail.com 
>>> <mailto:hkro...@gmail.com>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to verify and understand fully the functionality of row cache 
>>> in Cassandra.
>>> 
>>> I have been using mainly two different sources for information:
>>> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>>>  
>>> <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
>>> AND
>>> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
>>> <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
>>> 
>>> and based on what I read documentation is not correct. 
>>> 
>>> Documentation says like this:
>>> “rows_per_partition: The amount of rows to cache per partition (“row 
>>> cache”). If an integer n is specified, the first n queried rows of a 
>>> partition will be cached. Other possible options are ALL, to cache all rows 
>>> of a queried partition, or NONE to disable row caching.”
>>> 
>>> The problematic part is "the first n queried rows of a partition will be 
>>> cached”. Shouldn’t it be that the first N rows in a partition will be 
>>> cached? Not first N that are queried?
>>> 
>>> If this is the case, I’m more than happy to create a ticket (and maybe even 
>>> create a patch) for the doc update.
>>> 
>>> BR,
>>> Hannu
>>> 
>>

Re: Row cache functionality - Some confusion

2018-03-12 Thread Hannu Kröger

Anyone?

> On 4 Mar 2018, at 20:45, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to verify and understand fully the functionality of row cache in 
> Cassandra.
> 
> I have been using mainly two different sources for information:
> https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
>  
> <https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476>
> AND
> http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 
> <http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options>
> 
> and based on what I read documentation is not correct. 
> 
> Documentation says like this:
> “rows_per_partition: The amount of rows to cache per partition (“row cache”). 
> If an integer n is specified, the first n queried rows of a partition will be 
> cached. Other possible options are ALL, to cache all rows of a queried 
> partition, or NONE to disable row caching.”
> 
> The problematic part is "the first n queried rows of a partition will be 
> cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
> Not first N that are queried?
> 
> If this is the case, I’m more than happy to create a ticket (and maybe even 
> create a patch) for the doc update.
> 
> BR,
> Hannu
>

Re: vnodes: high availability

2018-03-12 Thread Hannu Kröger

If this is a universal recommendation, then should that actually be default in 
Cassandra? 

Hannu

> On 18 Jan 2018, at 00:49, Jon Haddad  wrote:
> 
> I *strongly* recommend disabling dynamic snitch.  I’ve seen it make latency 
> jump 10x.  
> 
> dynamic_snitch: false is your friend.
> 
> 
> 
>> On Jan 17, 2018, at 2:00 PM, Kyrylo Lebediev > > wrote:
>> 
>> Avi, 
>> If we prefer to have better balancing [like absence of hotspots during a 
>> node down event etc], large number of vnodes is a good solution.
>> Personally, I wouldn't prefer any balancing over overall resiliency  (and in 
>> case of non-optimal setup, larger number of nodes in a cluster decreases 
>> overall resiliency, as far as I understand.) 
>> 
>> Talking about hotspots, there is a number of features helping to mitigate 
>> the issue, for example:
>>   - dynamic snitch [if a node overloaded it won't be queried]
>>   - throttling of streaming operations
>> 
>> Thanks, 
>> Kyrill
>> 
>> From: Avi Kivity >
>> Sent: Wednesday, January 17, 2018 2:50 PM
>> To: user@cassandra.apache.org ; kurt 
>> greaves
>> Subject: Re: vnodes: high availability
>>  
>> On the flip side, a large number of vnodes is also beneficial. For example, 
>> if you add a node to a 20-node cluster with many vnodes, each existing node 
>> will contribute 5% of the data towards the new node, and all nodes will 
>> participate in streaming (meaning the impact on any single node will be 
>> limited, and completion time will be faster).
>> 
>> With a low number of vnodes, only a few nodes participate in streaming, 
>> which means that the cluster is left unbalanced and the impact on each 
>> streaming node is greater (or that completion time is slower).
>> 
>> Similarly, with a high number of vnodes, if a node is down its work is 
>> distributed equally among all nodes. With a low number of vnodes the cluster 
>> becomes unbalanced.
>> 
>> Overall I recommend high vnode count, and to limit the impact of failures in 
>> other ways (smaller number of large nodes vs. larger number of small nodes).
>> 
>> btw, rack-aware topology improves the multi-failure problem but at the cost 
>> of causing imbalance during maintenance operations. I recommend using 
>> rack-aware topology only if you really have racks with 
>> single-points-of-failure, not for other reasons.
>> 
>> On 01/17/2018 05:43 AM, kurt greaves wrote:
>>> Even with a low amount of vnodes you're asking for a bad time. Even if you 
>>> managed to get down to 2 vnodes per node, you're still likely to include 
>>> double the amount of nodes in any streaming/repair operation which will 
>>> likely be very problematic for incremental repairs, and you still won't be 
>>> able to easily reason about which nodes are responsible for which token 
>>> ranges. It's still quite likely that a loss of 2 nodes would mean some 
>>> portion of the ring is down (at QUORUM). At the moment I'd say steer clear 
>>> of vnodes and use single tokens if you can; a lot of work still needs to be 
>>> done to ensure smooth operation of C* while using vnodes, and they are much 
>>> more difficult to reason about (which is probably the reason no one has 
>>> bothered to do the math). If you're really keen on the math your best bet 
>>> is to do it yourself, because it's not a point of interest for many C* devs 
>>> plus probably a lot of us wouldn't remember enough math to know how to 
>>> approach it.
>>> 
>>> If you want to get out of this situation you'll need to do a DC migration 
>>> to a new DC with a better configuration of snitch/replication 
>>> strategy/racks/tokens.
>>> 
>>> 
>>> On 16 January 2018 at 21:54, Kyrylo Lebediev >> > wrote:
>>> Thank you for this valuable info, Jon.
>>> I guess both you and Alex are referring to improved vnodes allocation 
>>> method  https://issues.apache.org/jira/browse/CASSANDRA-7032 
>>>  which was 
>>> implemented in 3.0.
>>> Based on your info and comments in the ticket it's really a bad idea to 
>>> have small number of vnodes for the versions using old allocation method 
>>> because of hot-spots, so it's not an option for my particular case (v.2.1) 
>>> :( 
>>> 
>>> [As far as I can see from the source code this new method wasn't backported 
>>> to 2.1.]
>>> 
>>> 
>>> Regards, 
>>> Kyrill
>>> [CASSANDRA-7032] Improve vnode allocation - ASF JIRA 
>>> 
>>> issues.apache.org 
>>> It's been known for a little while that random vnode allocation causes 
>>> hotspots of ownership. It should be possible to improve dramatically on 
>>> this with deterministic ...
>>> 
>>> From: Jon Haddad >> > on

Re: How do counter updates work?

2018-03-05 Thread Hannu Kröger

So just to clarify we have two different use cases:
- TIMEUUID is there for client side generation of unique row ids. It’s great 
for that.
- Cassandra counters are not very good for row id generation and suited better 
to e.g. those use cases I listed before

Hannu


> On 5 Mar 2018, at 16:34, Javier Pareja <pareja.jav...@gmail.com> wrote:
> 
> Doesn't cassandra have TIMEUUID for these use cases?
> 
> Anyways, hopefully someone can help me better understand possible delays when 
> writing a counter.
> 
> F Javier Pareja
> 
> On Mon, Mar 5, 2018 at 1:54 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Traditionally auto increment counters have been used to generate SQL row IDs. 
> This is what Kyrylo probably is here referring to.
> 
> Cassandra counters are better tracking e.g. usage patterns, web site 
> visitors, statistics, etc. 
> 
> For accurate counting (e.g. for generating IDs) those counters are not good 
> because they are inaccurate in certain cases.
> 
> Hannu
> 
>> On 5 Mar 2018, at 15:50, Javier Pareja <pareja.jav...@gmail.com 
>> <mailto:pareja.jav...@gmail.com>> wrote:
>> 
>> Hi Kyrulo,
>> 
>> I don't understand how UUIDs are related to counters, but I use counters to 
>> increment the value of a cell in an atomic manner. I could try reading the 
>> value and then writing to the cell but then I would lose the atomicity of 
>> the update.
>> 
>> F Javier Pareja
>> 
>> On Mon, Mar 5, 2018 at 1:32 PM, Kyrylo Lebediev <kyrylo_lebed...@epam.com 
>> <mailto:kyrylo_lebed...@epam.com>> wrote:
>> Hello!
>> 
>> Can't answer your question but there is another one: "why do we need to 
>> maintain counters with their known limitations (and I've heard of some 
>> issues with implementation of counters in Cassandra), when there exist 
>> really effective uuid generation algorithms which allow us to generate 
>> unique values?"
>> https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html 
>> <https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html>
>>  <https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html>(the article 
>> is about keys in RDBMS's, but its statements are true for NoSQL as well)
>> 
>> Regards, 
>> Kyrill
>> From: jpar...@gmail.com <mailto:jpar...@gmail.com> <jpar...@gmail.com 
>> <mailto:jpar...@gmail.com>> on behalf of Javier Pareja 
>> <pareja.jav...@gmail.com <mailto:pareja.jav...@gmail.com>>
>> Sent: Monday, March 5, 2018 1:55:14 PM
>> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
>> Subject: How do counter updates work?
>>  
>> Hello everyone,
>> 
>> I am trying to understand how cassandra counter writes work in more detail 
>> but all that I could find is this: 
>> https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>  
>> <https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
>> From there I was able to extract the following process:
>> (click here to edit 
>> <https://www.draw.io/?lightbox=1=ff=_blank=1=1=Untitled%20Diagram.xml#R7VpRc6M2EP41nmkfcgPCYPKYOPHdTXudzCXtNY8yCKMGkCvk2LlfXwkkjATYjo0dp5O8BK2kZdldfftJ8sAZp6vPFM7jbyREyQBY4Wrg3AwAsIfA4%2F%2BE5KWU%2BP5lKZhRHMpBa8E9%2Fomk0JLSBQ5Rrg1khCQMz3VhQLIMBUyTQUrJUh8WkUR%2F6xzOUENwH8CkKf2BQxZLqW1Z644vCM9i%2BWrflR1TGDzNKFlk8n0D4ETFX9mdQqVLjs9jGJJlTeTcDpwxJYSVT%2BlqjBLhW%2BW2ct6ko7eym6KM7TRhhKDnjZDtBoCbGlwAXxrGXpQzUMh9I5uEspjMSAaT27X0uvhgJFTavBWzNJGPCZyi5LryyZgkhPKujGRiWs4gZVciXIZsghOhwVJtmSAub6MsVDOCBOY5Dh5inJUdcppdtmqT%2FkGMvcg2XDDCResP%2BZ2QuZyVM0qekLKSx84q%2FqoelQtibEQyNoEpTkSK%2F4VoCDMoxfJNvmy26ePhoS9%2Fi2%2F85Krmo%2FxktMKs1sVbj%2FKdzejKgOdkQQOVu45cMJDOkBwGSpGIZG2eTInPiKSIv58PoCiBDD%2FrqwDKxTSrxlVT7wjmlgBLLnwwlGktl%2F3Qt3QVpZ1yVj0tDUVDy1DkGorKj2so4qkBX2rD5mJA%2FgqDL62Ndjmbx%2FOH0gLVqjl3LSqWYPtydEvtzzBZIAUgB6xGS1%2BNuy6DTcnZmkjbc9KEGQXqtRy1vQNzshaFjU7uMqbm9dsVChYMceGYO5Mhyp%2F%2BnIdQiLyEG3w95SJvxiqP1CLEMX0uHmO0gty1fMgcUczNRHQtvVuLrpcxZuh%2BDgtvLXlF1cOmx7OqGNYO%2BPoKOOPYWRspi9YuMGeDDTgHEzzLBFajrPzWznR5RpSh1WATOqnekbEIZXNZq9RKFteLtHWE3AGN3PlBcZE5HDgI%2F%2FcNpQxOeW6YaaIXzS05cEjUvcBH02iXqIcQ%2BVFwnlH39aC7oCXqoCXqwx6ivjXIY5KmmCVk9nZRDkeX013XNvKC40W5EdKWwHdG%2BU3D7HxQ3zOnvq%2FmHS1cuB3Cz5gL24Yi7zy4sH1kLmxfnhcZtrVkrRK0z3QdtmSrc5TsHAI9eI6ZnR1JtUcchy1hlBxaUejvCIYDYVEp5xqrrmBBC5cpBcCKKEn5vxucPx1WbbvKYndp7aHAOZ7ueOCdsMB5jVBMyqOqSdOTMUmni3y7F%2Fso%2BsZuv43R%2B0ci9B%2FHXSep%2BT0D5aFHBR0V0DXIp7nmdi3ZZil

Re: How do counter updates work?

2018-03-05 Thread Hannu Kröger

Traditionally auto increment counters have been used to generate SQL row IDs. 
This is what Kyrylo probably is here referring to.

Cassandra counters are better tracking e.g. usage patterns, web site visitors, 
statistics, etc. 

For accurate counting (e.g. for generating IDs) those counters are not good 
because they are inaccurate in certain cases.

Hannu

> On 5 Mar 2018, at 15:50, Javier Pareja  wrote:
> 
> Hi Kyrulo,
> 
> I don't understand how UUIDs are related to counters, but I use counters to 
> increment the value of a cell in an atomic manner. I could try reading the 
> value and then writing to the cell but then I would lose the atomicity of the 
> update.
> 
> F Javier Pareja
> 
> On Mon, Mar 5, 2018 at 1:32 PM, Kyrylo Lebediev  > wrote:
> Hello!
> 
> Can't answer your question but there is another one: "why do we need to 
> maintain counters with their known limitations (and I've heard of some issues 
> with implementation of counters in Cassandra), when there exist really 
> effective uuid generation algorithms which allow us to generate unique 
> values?"
> https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html 
> 
>  (the article 
> is about keys in RDBMS's, but its statements are true for NoSQL as well)
> 
> Regards, 
> Kyrill
> From: jpar...@gmail.com   > on behalf of Javier Pareja 
> >
> Sent: Monday, March 5, 2018 1:55:14 PM
> To: user@cassandra.apache.org 
> Subject: How do counter updates work?
>  
> Hello everyone,
> 
> I am trying to understand how cassandra counter writes work in more detail 
> but all that I could find is this: 
> https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>  
> 
> From there I was able to extract the following process:
> (click here to edit 
> ).
>  
>  
> 
> PATH 1 will be much quicker than PATH 2 and its bottleneck (assuming HDD 
> drives) will be the commitlog
> PATH 2 will need at least an access to disk to do a read (potentially even in 
> a different machine) and an access to disk to do a write to the commitlog. 
> This is at least twice as slow as PATH 1.
> 
> This is all the info that I could get from the internet but a lot is missing. 
> For example, there is no information about how the counter lock is acquired, 
> is there a shared lock across

Row cache functionality - Some confusion

2018-03-04 Thread Hannu Kröger

Hello,

I am trying to verify and understand fully the functionality of row cache in 
Cassandra.

I have been using mainly two different sources for information:
https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
 

AND
http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 


and based on what I read documentation is not correct. 

Documentation says like this:
“rows_per_partition: The amount of rows to cache per partition (“row cache”). 
If an integer n is specified, the first n queried rows of a partition will be 
cached. Other possible options are ALL, to cache all rows of a queried 
partition, or NONE to disable row caching.”

The problematic part is "the first n queried rows of a partition will be 
cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
Not first N that are queried?

If this is the case, I’m more than happy to create a ticket (and maybe even 
create a patch) for the doc update.

BR,
Hannu

Re: How to Parse raw CQL text?

2018-02-26 Thread Hannu Kröger

If this is needed functionality, shouldn’t that be available as a public method 
or something? Maybe write a patch etc. ?

> Ariel Weisberg  kirjoitti 26.2.2018 kello 18.47:
> 
> Hi,
> 
> I took a similar approach and it worked fine. I was able to build a tool that 
> parsed production query logs.
> 
> I used a helper method that would just grab a private field out of an object 
> by name using reflection.
> 
> Ariel
> 
>> On Sun, Feb 25, 2018, at 11:58 PM, Jonathan Haddad wrote:
>> I had to do something similar recently.  Take a look at 
>> org.apache.cassandra.cql3.QueryProcessor.parseStatement().  I've got some 
>> sample code here [1] as well as a blog post [2] that explains how to access 
>> the private variables, since there's no access provided.  It wasn't really 
>> designed to be used as a library, so YMMV with future changes.  
>> 
>> [1] 
>> https://github.com/rustyrazorblade/rustyrazorblade-examples/blob/master/privatevaraccess/src/main/kotlin/com/rustyrazorblade/privatevaraccess/CreateTableParser.kt
>> [2] 
>> http://rustyrazorblade.com/post/2018/2018-02-25-accessing-private-variables-in-jvm/
>> 
>> On Mon, Feb 5, 2018 at 2:27 PM Kant Kodali  wrote:
>> I just did some trial and error. Looks like this would work
>> 
>> public class Test {
>> 
>> 
>> 
>> public static void main(String[] args) throws Exception {
>> 
>> String stmt = "create table if not exists test_keyspace.my_table 
>> (field1 text, field2 int, field3 set, field4 map, 
>> primary key (field1) );";
>> 
>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>> 
>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>> 
>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>> 
>> CqlParser parser = new CqlParser(token);
>> 
>> ParsedStatement query = parser.cqlStatement();
>> 
>> 
>> if (query.getClass().getDeclaringClass() == 
>> CreateTableStatement.class) {
>> 
>> CreateTableStatement.RawStatement cts = 
>> (CreateTableStatement.RawStatement) query;
>> 
>> CFMetaData
>> 
>> .compile(stmt, cts.keyspace())
>> 
>> 
>> 
>> .getColumnMetadata()
>> 
>> .values()
>> 
>> .stream()
>> 
>> .forEach(cd -> System.out.println(cd));
>> 
>> 
>> }
>>}
>> }
>> 
>> On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali  wrote:
>> Hi Anant,
>> 
>> I just have CQL create table statement as a string I want to extract all the 
>> parts like, tableName, KeySpaceName, regular Columns,  partitionKey, 
>> ClusteringKey, Clustering Order and so on. Thats really  it!
>> 
>> Thanks!
>> 
>> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh  
>> wrote:
>> I think I understand what you are trying to do … but what is your goal? What 
>> do you mean “use it for different” queries… Maybe you want to do an event 
>> and have an event processor? Seems like you are trying to basically by pass 
>> that pattern and parse a query and split it into several actions? 
>> 
>> Did you look into this unit test folder? 
>> 
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java
>> 
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>> 
>> Anant Corporation
>> 
>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali , wrote:
>> 
>>> Hi All,
>>> 
>>> I have a need where I get a raw CQL create table statement as a String and 
>>> I need to parse the keyspace, tablename, columns and so on..so I can use it 
>>> for various queries and send it to C*. I used the example below from this 
>>> link. I get the following error.  And I thought maybe someone in this 
>>> mailing list will be more familiar with internals.  
>>> 
>>> Exception in thread "main" 
>>> org.apache.cassandra.exceptions.ConfigurationException: Keyspace 
>>> test_keyspace doesn't exist
>>> at 
>>> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
>>> at com.hello.world.Test.main(Test.java:23)
>>> 
>>> 
>>> Here is my code.
>>> 
>>> package com.hello.world;
>>> 
>>> 
>>> 
>>> import org.antlr.runtime.ANTLRStringStream;
>>> 
>>> import org.antlr.runtime.CommonTokenStream;
>>> 
>>> import org.apache.cassandra.cql3.CqlLexer;
>>> 
>>> import org.apache.cassandra.cql3.CqlParser;
>>> 
>>> import org.apache.cassandra.cql3.statements.CreateTableStatement;
>>> 
>>> import org.apache.cassandra.cql3.statements.ParsedStatement;
>>> 
>>> 
>>> 
>>> public class Test {
>>> 
>>> 
>>> 
>>> public static void main(String[] args) throws Exception {
>>> 
>>> String stmt = "create table if not exists test_keyspace.my_table 
>>> (field1 text, field2 int, field3 set, field4 map, 
>>> primary key (field1) );";
>>> 
>>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>>> 
>>> CqlLexer cqlLexer = new

Re: vnodes status verification

2018-02-26 Thread Hannu Kröger

Hello,

you can always run “nodetool ring” to see all tokens.

Hannu

> On 26 Feb 2018, at 12:32, Ivan Iliev  wrote:
> 
> Hello C* Gurus,
> 
> I am quite new to cassandra so I am struggling over the concent of vnodes and 
> how to verify if those are properly enabled on my cluster.
> 
> I have the num_tokens in yaml set to 256 across all nodes and initial_token 
> is commented out of the config.
> 
> Apart from that I cannot find any other configuration needed to be applied 
> for this to work, but I'm also not sure how to verify if this is OK apart 
> from doing "nodetool status" which shows this:
> 
> Status=Up/Down 
> |/ State=Normal/Leaving/Joining/Moving 
> --  AddressLoad   Tokens   OwnsHost ID
>Rack 
> DN  10.32.16.48125.08 GiB  256  ?   
> 0c90ef09-097f-4e24-80ca-fc87e3c44e3c  RAC1 
> UN  10.32.16.194   116.3 GiB  256  ?   
> 59e194df-dc44-4a82-a95c-2aac3d5292dc  RAC1 
> UN  10.32.17.4 38.89 GiB  256  ?   
> 87745b2e-3be9-4dc9-a39d-298830657d05  RAC1 
> UN  10.32.16.31111.93 GiB  256  ?   
> a4666251-816a-424b-a374-63da2b9a3dab  RAC1
> 
> Is there anything else I can check to verify if vnodes are "created" ?
> 
> Also is there a way to get a list of the vnodes in a cluster or any other 
> info about it that may be valuable for managing those ?
> 
> Or is the concept I have about vnodes wrong and there are no such 
> representations as seperate nodes in the cluster for example?
> 
> Thank you,
> Ivan
> -- 
> Best regards
> Ivan I. Iliev
> System Administrator
> 
> Melexis Bulgaria Ltd.
> 2 Samokovsko shose Blvd.
> 1138 Sofia
> Bulgaria
> 
> Mobile:+359 88 9221923
> E-mail: iai @melexis.com 
> Website: www.melexis.com 
> 
> The contents of this e-mail are CONFIDENTIAL AND PROPRIETARY. Please read our 
> disclaimer at http://www.melexis.com/mailpolicy 
> .

Re: Setting min_index_interval to 1?

2018-02-02 Thread Hannu Kröger

Wouldn’t that still try to read the index on the disk? So you would just 
potentially have all keys on the memory and on the disk and reading would first 
happen in memory and then on the disk and only after that you would read the 
sstable.

So you wouldn’t gain much, right?

Hannu

> On 2 Feb 2018, at 02:25, Nate McCall  wrote:
> 
> 
> Another was the crazy idea I started with of setting min_index_interval to 1. 
> My guess was that this would cause it to read all index entries, and 
> effectively have them all cached permanently. And it would read them straight 
> out of the SSTables on every restart. Would this work? Other than probably 
> causing a really long startup time, are there issues with this?
> 
> 
> I've never tried that. It sounds like you understand the potential impact on 
> memory and startup time. If you have the data in such a way that you can 
> easily experiment, I would like to see a breakdown of the impact on response 
> time vs. memory usage as well as where the point of diminishing returns is on 
> turning this down towards 1 (I think there will be a sweet spot somewhere). 
>

Re: Repair fails for unknown reason

2018-01-09 Thread Hannu Kröger

We have run restarts on the cluster and that doesn’t seem to help at all.

We ran repair separately for each table that seems to go through usually but 
running a repair on a keyspace doesn’t. 

Anything anyone?

Hannu

> On 3 Jan 2018, at 23:24, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> I can certainly try that. No problem there.
> 
> However wouldn’t we then get this kind of errors if that was the case:
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> ?
> 
> Hannu
> 
>> On 3 Jan 2018, at 20:50, Nandakishore Tokala <nandakishore.tok...@gmail.com 
>> <mailto:nandakishore.tok...@gmail.com>> wrote:
>> 
>> hi Hannu,
>> 
>> I think some of the repairs are hanging there. please restart all the nodes 
>> in the  cluster and start the repair 
>> 
>> 
>> Thanks
>> Nanda
>> 
>> On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger <hkro...@gmail.com 
>> <mailto:hkro...@gmail.com>> wrote:
>> Additional notes:
>> 
>> 1) If I run the repair just on those tables, it works fine
>> 2) Those tables are empty
>> 
>> Hannu
>> 
>> > On 3 Jan 2018, at 18:23, Hannu Kröger <hkro...@gmail.com 
>> > <mailto:hkro...@gmail.com>> wrote:
>> >
>> > Hello,
>> >
>> > Situation is as follows:
>> >
>> > Repair was started on node X on this keyspace with —full —pr. Repair fails 
>> > on node Y.
>> >
>> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m 
>> > looking at the debug.log. I see following messages related to this repair 
>> > request:
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 
>> > RepairMessageVerbHandler.java:114 - Validating 
>> > ValidationRequest{gcBefore=1511473932} 
>> > org.apache.cassandra.repair.messages.ValidationRequest@5a17430c
>> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 
>> > StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable
>> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 
>> > ColumnFamilyStore.java:954 - forceFlush requested but everything is clean 
>> > in mytable
>> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - 
>> > Failed creating a merkle tree for [repair 
>> > #1df000a0-effa-11e7-8361-b7c9edfbfc33 on mykeyspace/mytable, 
>> > [(6917529027641081856,-9223372036854775808]]], /123.123.123.123 
>> > <http://123.123.123.123/> (see log for details)
>> > ---
>> >
>> > then the same about another table and after that which indicates that 
>> > repair “master” has told to abort basically, right?
>> >
>> > ---
>> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
>> > RepairMessageVerbHandler.java:142 - Got anticompaction request 
>> > AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33}
>> >  org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
>> > ea
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
>> > RepairMessageVerbHandler.java:168 - Got error, removing parent repair 
>> > session
>> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 
>> > CassandraDaemon.java:228 - Exception in thread 
>> > Thread[AntiEntropyStage:1,5,main]
>> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair 
>> > session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>> >at 
>> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at org.apache.cassandra.net 
>> > <http://org.apache.cassandra.net/>.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
>> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
>> >at 
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
>> > ~[na:1.8.0_111]
>> >at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
>> > ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >  ~[na:1.8.0_111]
>> >at 
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >  [na:1.8.0_111]
>> >at 
>> > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadL

Re: Repair fails for unknown reason

2018-01-03 Thread Hannu Kröger

I can certainly try that. No problem there.

However wouldn’t we then get this kind of errors if that was the case:
java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
?

Hannu

> On 3 Jan 2018, at 20:50, Nandakishore Tokala <nandakishore.tok...@gmail.com> 
> wrote:
> 
> hi Hannu,
> 
> I think some of the repairs are hanging there. please restart all the nodes 
> in the  cluster and start the repair 
> 
> 
> Thanks
> Nanda
> 
> On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Additional notes:
> 
> 1) If I run the repair just on those tables, it works fine
> 2) Those tables are empty
> 
> Hannu
> 
> > On 3 Jan 2018, at 18:23, Hannu Kröger <hkro...@gmail.com 
> > <mailto:hkro...@gmail.com>> wrote:
> >
> > Hello,
> >
> > Situation is as follows:
> >
> > Repair was started on node X on this keyspace with —full —pr. Repair fails 
> > on node Y.
> >
> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m looking 
> > at the debug.log. I see following messages related to this repair request:
> >
> > ---
> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 
> > RepairMessageVerbHandler.java:114 - Validating 
> > ValidationRequest{gcBefore=1511473932} 
> > org.apache.cassandra.repair.messages.ValidationRequest@5a17430c
> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 
> > StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable
> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 
> > ColumnFamilyStore.java:954 - forceFlush requested but everything is clean 
> > in mytable
> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - 
> > Failed creating a merkle tree for [repair 
> > #1df000a0-effa-11e7-8361-b7c9edfbfc33 on mykeyspace/mytable, 
> > [(6917529027641081856,-9223372036854775808]]], /123.123.123.123 
> > <http://123.123.123.123/> (see log for details)
> > ---
> >
> > then the same about another table and after that which indicates that 
> > repair “master” has told to abort basically, right?
> >
> > ---
> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
> > RepairMessageVerbHandler.java:142 - Got anticompaction request 
> > AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33}
> >  org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
> > ea
> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
> > RepairMessageVerbHandler.java:168 - Got error, removing parent repair 
> > session
> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 CassandraDaemon.java:228 
> > - Exception in thread Thread[AntiEntropyStage:1,5,main]
> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair 
> > session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
> >at 
> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171)
> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
> >at org.apache.cassandra.net 
> > <http://org.apache.cassandra.net/>.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
> >at 
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> > ~[na:1.8.0_111]
> >at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> > ~[na:1.8.0_111]
> >at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >  ~[na:1.8.0_111]
> >at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >  [na:1.8.0_111]
> >at 
> > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> >  [apache-cassandra-3.11.0.jar:3.11.0]
> >at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
> > Caused by: java.lang.RuntimeException: Parent repair session with id = 
> > 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
> >at 
> > org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:409)
> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
> >at 
> > org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:444)
> >  ~[apache-cassandra-3.11.0.jar:3.11.0]
> >at 
> > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
> >

Re: Repair fails for unknown reason

2018-01-03 Thread Hannu Kröger

Additional notes:

1) If I run the repair just on those tables, it works fine
2) Those tables are empty

Hannu

> On 3 Jan 2018, at 18:23, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> Hello,
> 
> Situation is as follows:
> 
> Repair was started on node X on this keyspace with —full —pr. Repair fails on 
> node Y.
> 
> Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m looking 
> at the debug.log. I see following messages related to this repair request:
> 
> ---
> DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 
> RepairMessageVerbHandler.java:114 - Validating 
> ValidationRequest{gcBefore=1511473932} 
> org.apache.cassandra.repair.messages.ValidationRequest@5a17430c
> DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 StorageService.java:3321 
> - Forcing flush on keyspace mykeyspace, CF mytable
> DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 
> ColumnFamilyStore.java:954 - forceFlush requested but everything is clean in 
> mytable
> ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - 
> Failed creating a merkle tree for [repair 
> #1df000a0-effa-11e7-8361-b7c9edfbfc33 on mykeyspace/mytable, 
> [(6917529027641081856,-9223372036854775808]]], /123.123.123.123 (see log for 
> details)
> ---
> 
> then the same about another table and after that which indicates that repair 
> “master” has told to abort basically, right?
> 
> ---
> DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
> RepairMessageVerbHandler.java:142 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
> ea
> ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
> RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
> ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 CassandraDaemon.java:228 - 
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session 
> with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171)
>  ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
>at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_111]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_111]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_111]
>at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>  [apache-cassandra-3.11.0.jar:3.11.0]
>at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
> Caused by: java.lang.RuntimeException: Parent repair session with id = 
> 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
>at 
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:409)
>  ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:444)
>  ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
>  ~[apache-cassandra-3.11.0.jar:3.11.0]
>... 7 common frames omitted
> ---
> 
> But that is almost all in the log and I don’t really see what the original 
> problem here is. 
> 
> Cassandra flushes the table to start building merkle tree and on next 
> millisecond it already fails the repair but without proper exception or error 
> logging about the problem.
> 
> Cassandra version is the 3.11.0.
> 
> Any ideas?
> 
> Cheers,
> Hannu


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Repair fails for unknown reason

2018-01-03 Thread Hannu Kröger

Hello,

Situation is as follows:

Repair was started on node X on this keyspace with —full —pr. Repair fails on 
node Y.

Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m looking at 
the debug.log. I see following messages related to this repair request:

---
DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 
RepairMessageVerbHandler.java:114 - Validating 
ValidationRequest{gcBefore=1511473932} 
org.apache.cassandra.repair.messages.ValidationRequest@5a17430c
DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 StorageService.java:3321 - 
Forcing flush on keyspace mykeyspace, CF mytable
DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 ColumnFamilyStore.java:954 
- forceFlush requested but everything is clean in mytable
ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 - 
Failed creating a merkle tree for [repair #1df000a0-effa-11e7-8361-b7c9edfbfc33 
on mykeyspace/mytable, [(6917529027641081856,-9223372036854775808]]], 
/123.123.123.123 (see log for details)
---

then the same about another table and after that which indicates that repair 
“master” has told to abort basically, right?

---
DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
RepairMessageVerbHandler.java:142 - Got anticompaction request 
AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33} 
org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be
ea
ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 
RepairMessageVerbHandler.java:168 - Got error, removing parent repair session
ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 CassandraDaemon.java:228 - 
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Parent repair session 
with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:171)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_111]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_111]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_111]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.0.jar:3.11.0]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
Caused by: java.lang.RuntimeException: Parent repair session with id = 
1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed.
at 
org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:409)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.ActiveRepairService.doAntiCompaction(ActiveRepairService.java:444)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:143)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
... 7 common frames omitted
---

But that is almost all in the log and I don’t really see what the original 
problem here is. 

Cassandra flushes the table to start building merkle tree and on next 
millisecond it already fails the repair but without proper exception or error 
logging about the problem.

Cassandra version is the 3.11.0.

Any ideas?

Cheers,
Hannu
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Nodetool compactionstats hangs

2017-12-19 Thread Hannu Kröger

Hi,

Sure! I attached the jstack dumps on the ticket.

Hannu

On 19 December 2017 at 14:38:45, Jeff Jirsa (jji...@gmail.com) wrote:

Can you grab a thread dump with jstack as well?

--
Jeff Jirsa


On Dec 19, 2017, at 3:32 AM, Hannu Kröger <hkro...@gmail.com> wrote:

Hi,

I opened a ticket about nodetool compactionstats hanging:
https://issues.apache.org/jira/browse/CASSANDRA-14130

The root cause seems to be JMX metric fetching hanging. I was able to
replicate it on this problematic like this:

Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName
#bean is set to
org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName
$>get Value
#mbean = 
org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName:

// This command never finishes.


This happens only on one node AFAIK. What could possibly be the problem?

Hannu

Nodetool compactionstats hangs

2017-12-19 Thread Hannu Kröger

Hi,

I opened a ticket about nodetool compactionstats hanging:
https://issues.apache.org/jira/browse/CASSANDRA-14130

The root cause seems to be JMX metric fetching hanging. I was able to
replicate it on this problematic like this:

Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName
#bean is set to
org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName
$>get Value
#mbean = 
org.apache.cassandra.metrics:type=Compaction,name=PendingTasksByTableName:

// This command never finishes.


This happens only on one node AFAIK. What could possibly be the problem?

Hannu

TWCS on partitions spanning multiple time windows

2017-12-14 Thread Hannu Kröger

Hi,

I have been reading a bit about TWCS to understand how it functions.

Current assumption: TWCS uses same tombstone checks as any other compaction
strategy to make sure that it doesn’t remove tombstones unless it is safe
to do so.

Scenario 1:

So let’s assume that I have a tables like this:

CREATE TABLE twcs.twcs (
user_id int,
id int,
value int,
text_value text,
PRIMARY KEY (user_id, id, value)
)

I insert data for multiple users but also multiple events per user with TTL
of 10 days and have time window set to 1 day.

Basically when 10 days are up the first sstable contains just TTL'd data.
However TWCS cannot just drop sstables because same partition exists in
sstables of other windows. Otherwise it wouldn’t be safe, right?

Scenario 2:

Table is as follows

CREATE TABLE twcs.twcs2 (
user_id int,
day int,
id int,
value int,
text_value text,
PRIMARY KEY ((user_id, day), id, value)
)

I insert data with TTL of 10 days and have time window set to 2 days.

Basically when 10 days are up the first sstable contains just TTL:d data.
In this case TWCS can drop the whole sstable because the whole partition is
in the same time window and same sstable. Correct?

If we for some reason have time window and partition time buckets
misaligned, e.g. time window is 25 hours and time bucket is 24 hours, then
we end up in situation where we will actually never get rid of all
tombstones because same partition data will be across multiple time windows
which won’t be compacted together. So we would be in trouble, right?

Did I get it right?

Hannu

Re: Upgrade using rebuild

2017-12-14 Thread Hannu Kröger

If you want to do a version upgrade, you need to basically do follow node
by node:

0) stop repairs
1) make sure your sstables are at the latest version (nodetool
upgradesstables can do it)
2) stop cassandra
3) update cassandra software and update cassandra.yaml and cassandra-env.sh
files
4) start cassandra

After all nodes are up, run “nodetool upgradesstables” on each node to
update your sstables to the latest version.

Also please note that when you upgrade, you need to upgrade only between
compatible versions.

E.g. 2.2.x -> 3.0.x  but not 1.2 to 3.11

Cheers,
Hannu

On 14 December 2017 at 12:33:49, Anshu Vajpayee (anshu.vajpa...@gmail.com)
wrote:

Hi -

Is it possible to upgrade a  cluster ( DC wise) using nodetool rebuild ?



--
*C*heers,*
*Anshu V*

Re: Best approach to prepare to shutdown a cassandra node

2017-10-12 Thread Hannu Kröger

Hi,

Drain should be enough.  It stops accepting writes and after that cassandra
can be safely shut down.

Hannu

On 12 October 2017 at 20:24:41, Javier Canillas (javier.canil...@gmail.com)
wrote:

Hello everyone,

I have some time working with Cassandra, but every time I need to shutdown
a node (for any reason like upgrading version or moving instance to another
host) I see several errors on the client applications (yes, I'm using the
official java driver).

By the way, I'm starting C* as a stand-alone process
,
and C* version is 3.11.0.

The way I have implemented the shutdown process is something like the
following:

*# Drain all information from commitlog into sstables*

*bin/nodetool drain*


*cassandra_pid=`ps -ef|grep "java.*apache-cassandra"|grep -v "grep"|awk
'{print $2}'`*
*if [ ! -z "$cassandra_pid" ] && [ "$cassandra_pid" -ne "1" ]; then*
*echo "Asking Cassandra to shutdown (nodetool drain doesn't stop
cassandra)"*
*kill $cassandra_pid*

*echo -n "+ Checking it is down. "*
*counter=10*
*while [ "$counter" -ne 0 -a ! kill -0 $cassandra_pid > /dev/null
2>&1 ]*
*do*
*echo -n ". "*
*((counter--))*
*sleep 1s*
*done*
*echo ""*
*if ! kill -0 $cassandra_pid > /dev/null 2>&1; then*
*echo "+ Its down."*
*else*
*echo "- Killing Cassandra."*
*kill -9 $cassandra_pid*
*fi*
*else*
*echo "Care there was a problem finding Cassandra PID"*
*fi*

Should I add at the beginning the following lines?

echo "shutdowing cassandra gracefully with: nodetool disable gossip"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablegossip
echo "shutdowing cassandra gracefully with: nodetool disable binary
protocol"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablebinary
echo "shutdowing cassandra gracefully with: nodetool thrift"
$CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablethrift

The shutdown log is the following:

*WARN  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,343
StorageService.java:321 - Stopping gossip by operator request*
*INFO  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,344
Gossiper.java:1532 - Announcing shutdown*
*INFO  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,355
StorageService.java:2268 - Node /10.254.169.36  state
jump to shutdown*
*INFO  [RMI TCP Connection(12)-127.0.0.1] 2017-10-12 14:20:56,141
Server.java:176 - Stop listening for CQL clients*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,472
StorageService.java:1442 - DRAINING: starting drain process*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,474
HintsService.java:220 - Paused hints dispatch*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,477
Gossiper.java:1532 - Announcing shutdown*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,480
StorageService.java:2268 - Node /127.0.0.1  state jump to
shutdown*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:01,483
MessagingService.java:984 - Waiting for messaging service to quiesce*
*INFO  [ACCEPT-/192.168.6.174 ] 2017-10-12
14:21:01,485 MessagingService.java:1338 - MessagingService has terminated
the accept() thread*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,095
HintsService.java:220 - Paused hints dispatch*
*INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,111
StorageService.java:1442 - DRAINED*

Disabling Gossip seemed a good idea, but watching the logs, it may use it
to gracefully telling the other nodes he is going down, so I don't know if
it's good or bad idea.

Disabling Thrift and Binary protocol should only avoid new connections, but
the one stablished and running should be attempted to finish.

Any thoughts or comments?

Thanks

Javier.

Re: [RELEASE] Apache Cassandra 3.11.1 released

2017-10-11 Thread Hannu Kröger

Hi,

Isn’t that already here:
http://dl.bintray.com/apache/cassandra/dists/311x/main/binary-amd64/ ?

Hannu

On 11 October 2017 at 16:33:27, Lucas Benevides (lu...@maurobenevides.com.br)
wrote:

Hello Michael Schuler,

When will this version become available for upgrade from apt-get? I visited
the address http://www.apache.org/dist/cassandra/debian and there was no
version 3111.

To me it is easier to upgrade the nodes this way as I am in a lab, not in a
production site.

Thanks in advance,
Lucas Benevides

2017-10-10 18:14 GMT-03:00 Michael Shuler :

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.11.1.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.11 series. As always,
> please pay attention to the release notes[2] and Let us know[3] if you
> were to encounter any problem.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/QFBuPn
> [2]: (NEWS.txt) https://goo.gl/vHd41x
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Materialized views stability

2017-10-04 Thread Hannu Kröger

Ok, thanks for the info. This is what I wanted to know.

Cheers,
Hannu

On 2 October 2017 at 18:27:16, Carlos Rolo (r...@pythian.com) wrote:

I've been dealing with MV extensively, and I second Blake. MVs are not
suitable for production. Unless you're ready for the pain (The out of sync
is a major pain point), I would not go that way.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 918 918 100
www.pythian.com

On Mon, Oct 2, 2017 at 4:50 PM, Blake Eggleston <beggles...@apple.com>
wrote:

> Hi Hannu,
>
> There are more than a few committers that don't think MVs are currently
> suitable for production use. I'm not involved with MV development, so this
> may not be 100% accurate, but the problems as I understand them are:
>
> There's no way to determine if a view is out of sync with the base table.
> If you do determine that a view is out of sync, the only way to fix it is
> to drop and rebuild the view.
> There are liveness issues with updates being reflected in the view.
>
> Any one of these issues makes it difficult to recommend for general
> application development. I'd say that if you're not super familiar with
> their shortcomings and are confident you can fit your use case in them,
> you're probably better off not using them.
>
> Thanks,
>
> Blake
>
> On October 2, 2017 at 6:55:52 AM, Hannu Kröger (hkro...@gmail.com) wrote:
>
> Hello,
>
> I have seen some discussions around Materialized Views and stability of
> that functionality.
>
> There are some open issues around repairs:
> https://issues.apache.org/jira/browse/CASSANDRA-13810?
> jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%
> 20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%
> 20Reopened%2C%20%22Patch%20Available%22%2C%20Testing%
> 2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%
> 20component%20%3D%20%22Materialized%20Views%22
>
> Is it so that the current problems are mostly related to incremental
> repairs or are there also other major concerns why some people don’t
> consider them to be safe for production use?
>
> Cheers,
> Hannu
>
>

--

Materialized views stability

2017-10-02 Thread Hannu Kröger

Hello,

I have seen some discussions around Materialized Views and stability of
that functionality.

There are some open issues around repairs:
https://issues.apache.org/jira/browse/CASSANDRA-13810?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20Testing%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20component%20%3D%20%22Materialized%20Views%22

Is it so that the current problems are mostly related to incremental
repairs or are there also other major concerns why some people don’t
consider them to be safe for production use?

Cheers,
Hannu

Re: network down between DCs

2017-09-21 Thread Hannu Kröger

Hi,

That’s correct.

You need to run repairs only after a node/DC/connection is down for more
then max_hint_window_in_ms.

Cheers,
Hannu



On 21 September 2017 at 11:30:44, Peng Xiao (2535...@qq.com) wrote:

Hi there,

We have two DCs for a Cassandra Cluster,if the network is down less than 3
hours(default hint window),with my understanding,it will recover
automatically,right?Do we need to run repair manually?

Thanks,
Peng Xiao

Re: Historical data movement to other cluster

2017-09-13 Thread Hannu Kröger

Hi,

If you have that data in different tables, then it’s relatively straight
forward operations of loading only certain tables with sstableloader.

If not, then you could use spark to read and filter data from one cluster
and store that into another cluster.

Hannu

On 13 September 2017 at 08:59:18, Harika Vangapelli -T (hvangape - AKRAYA
INC at Cisco) (hvang...@cisco.com) wrote:

Is there a way to move past 3 months data from one cassandra cluster to
other cluster?



Thanks,

Harika

[image:
http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]



*Harika Vangapelli*

Engineer - IT

hvang...@cisco.com

Tel:

*Cisco Systems, Inc.*




United States
cisco.com



[image: http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think
before you print.

This email may contain confidential and privileged material for the sole
use of the intended recipient. Any review, use, distribution or disclosure
by others is strictly prohibited. If you are not the intended recipient (or
authorized to receive for the recipient), please contact the sender by
reply email and delete all copies of this message.

Please click here
 for
Company Registration Information.


image001.png@01D32C1A.B6E21CC0
Description: Binary data


image002.gif@01D32C1A.B6E21CC0
Description: Binary data

Re: Rebalance a cassandra cluster

2017-09-13 Thread Hannu Kröger

Hi,

you should make sure that token range is evenly distributed if you have a
single token configured per node. You can use e.g. this tool to calculate
tokens:
https://www.geroba.com/cassandra/cassandra-token-calculator/

Also, make sure that none of the partitions in your data model are hotspots
that contain a lot more data than on average. Check also materialized views
if you use them.

Also, due to way the compactions work, it’s normal that the disk usage goes
up and down. Since nodes often do that in different rhythms, you always see
that some node(s) are using more disk space than others if some point of
time especially if you do updates and not just inserts.

Cheers,
Hannu

On 13 September 2017 at 07:47:09, Akshit Jain (akshit13...@iiitd.ac.in)
wrote:

Hi,
Can a cassandra cluster be unbalanced in terms of data?
If yes then how to rebalance a cassandra cluster.

Do not use Cassandra 3.11.0+ or Cassandra 3.0.12+

2017-08-28 Thread Hannu Kröger

Hello,

Current latest Cassandra version (3.11.0, possibly also 3.0.12+) has a race
condition that causes Cassandra to create broken sstables (stats file in
sstables to be precise).

Bug described here:
https://issues.apache.org/jira/browse/CASSANDRA-13752

This change might be causing it (but not sure):
https://issues.apache.org/jira/browse/CASSANDRA-13038

Other related issues:
https://issues.apache.org/jira/browse/CASSANDRA-13718
https://issues.apache.org/jira/browse/CASSANDRA-13756

I would not recommend using 3.11.0 nor upgrading to 3.0.12 or higher before
this is fixed.

Cheers,
Hannu

Re: Corrupted commit log prevents Cassandra start

2017-07-07 Thread Hannu Kröger

Hello,

yes, that’s what we do when things like this happen.

My thinking is just that when commit log is corrupted, you cannot really do
anything else but exactly those steps. Delete corrupted file and run repair
after starting. At least I haven’t heard of any tools for salvaging commit
log sections.

Current behaviour gives DBA control over when to do those things and of
course DBA realizes this way that things didn’t go ok but that’s about it.
There is no alternative way of healing the system or anything.

Hannu

On 7 July 2017 at 12:03:06, benjamin roth (brs...@gmail.com) wrote:

Hi Hannu,

I remember there have been discussions about this in the past. Most
probably there is already a JIRA for this.
I roughly remember a consense like that:
- Default behaviour should remain
- It should be configurable to the needs and preferences of the DBA
- It should at least spit out errors in the logs

... of course it would be even better to have the underlying issue fixed
that commit logs should not be corrupt but I remember that this is not so
easy due to some "architectural implications" of Cassandra. IIRC Ed
Capriolo posted something related to that some months ago.

For a quick fix, I'd recommend:
- Delete the affected log file
- Start the node
- Run a full-range (not -pr) repair on that node

2017-07-07 10:57 GMT+02:00 Hannu Kröger <hkro...@gmail.com>:

> Hello,
>
> We had a test server crashing for some reason (not related to Cassandra
> probably) and now when trying to start cassandra, it gives following error:
>
> ERROR [main] 2017-07-06 09:29:56,140 JVMStabilityInspector.java:82 -
> Exiting due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
> Mutation checksum failure at 24240116 in Next section at 24239690 in
> CommitLog-6-1498576271195.log
> at org.apache.cassandra.db.commitlog.CommitLogReader.
> readSection(CommitLogReader.java:332) [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:201)
> [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.commitlog.CommitLogReader.
> readAllFiles(CommitLogReader.java:84) [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.commitlog.CommitLogReplayer.
> replayFiles(CommitLogReplayer.java:140) [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.db.commitlog.CommitLog.
> recoverFiles(CommitLog.java:177) [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:326)
> [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
> [apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
> [apache-cassandra-3.10.jar:3.10]
>
> Shouldn’t Cassandra tolerate this situation?
>
> Of course we can delete commit logs and life goes on. But isn’t this a bug
> or something?
>
> Hannu
>
>

Corrupted commit log prevents Cassandra start

2017-07-07 Thread Hannu Kröger

Hello,

We had a test server crashing for some reason (not related to Cassandra
probably) and now when trying to start cassandra, it gives following error:

ERROR [main] 2017-07-06 09:29:56,140 JVMStabilityInspector.java:82 -
Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
Mutation checksum failure at 24240116 in Next section at 24239690 in
CommitLog-6-1498576271195.log
at
org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:332)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:201)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:84)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:140)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:326)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
[apache-cassandra-3.10.jar:3.10]

Shouldn’t Cassandra tolerate this situation?

Of course we can delete commit logs and life goes on. But isn’t this a bug
or something?

Hannu

Re: Repair on system_auth

2017-07-06 Thread Hannu Kröger

You can also stop repair using JMX without restarting. There are scripts to do 
that.

Hannu

> On 6 Jul 2017, at 23.24, Fay Hou [Storage Service]  
> wrote:
> 
> There is a bug on repair system_auth keyspace. We just skip the repair on 
> system_auth. Yes. it is ok to kill the running repair job
> 
>> On Thu, Jul 6, 2017 at 1:14 PM, Subroto Barua  
>> wrote:
>> you can check the status via nodetool netstats
>> to kill the repair job, restart the instance
>> 
>> 
>> On Thursday, July 6, 2017, 1:09:42 PM PDT, Mark Furlong 
>>  wrote:
>> 
>> 
>> I have started a repair on my system_auth keyspace. The repair has started 
>> and the process shows as running with ps but am not seeing any CPU with top. 
>> I’m also note seeing any antientropy sessions building merkle trees in the 
>> log. Can I safely kill a repair and how?
>> 
>>  
>> 
>>  
>> 
>> Mark Furlong
>> 
>> Sr. Database Administrator
>> 
>> mfurl...@ancestry.com
>> M: 801-859-7427
>> 
>> O: 801-705-7115
>> 
>> 1300 W Traverse Pkwy
>> 
>> Lehi, UT 84043
>> 
>>  
>> 
>>  
>> 
>> 
>> 
>>  
>> 
>>  
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Linux version update on DSE

2017-06-27 Thread Hannu Kröger

So,

If you are just moving data to another node, you can just start the new C*
node up without any replace_address parameters. Cassandra will treat it as
existing node with changed IP address (assuming all files within data are
moved).

Hannu

On 27 June 2017 at 20:43:18, Anuj Wadehra (anujw_2...@yahoo.co.in) wrote:

Hi Nitan,

I think it would be simpler to take one node down at a time and replace it
by bringing the new node up after linux upgrade, doing same cassandra
setup, using replace_address option and setting autobootstrap=false ( as
data is already there). No downtime as it would be a rolling upgrade. No
streaming as same tokens would work.

If you have latest C*, use replace_address_first_boot. If option not
available, use replace_address and make sure you remove it once new node is
up.

Try it and let us know if it works for you.

Thanks
Anuj

On Tue, Jun 27, 2017 at 4:56 AM, Nitan Kainth
<ni...@bamlabs.com> wrote:
Right, we are just upgrading Linux on AWS. C* will remain at same version.

On Jun 26, 2017, at 6:05 PM, Hannu Kröger <hkro...@gmail.com> wrote:

I understood he is updating linux, not C*

Hannu

On 27 June 2017 at 02:04:34, Jonathan Haddad (j...@jonhaddad.com) wrote:

It sounds like you're suggesting adding new nodes in to replace existing
ones.  You can't do that because it requires streaming between versions,
which isn't supported.

You need to take a node down, upgrade the C* version, then start it back
up.

Jon

On Mon, Jun 26, 2017 at 3:56 PM Nitan Kainth <ni...@bamlabs.com> wrote:

It's vnodes. We will add to replace new ip in yaml as well.

Thank you.

Sent from my iPhone

> On Jun 26, 2017, at 4:47 PM, Hannu Kröger <hkro...@gmail.com> wrote:
>
> Looks Ok. Step 1.5 would be to stop cassandra on existing node but apart
from that looks fine. Assuming you are using same configs and if you have
hard coded the token(s), you use the same.
>
> Hannu
>
>> On 26 Jun 2017, at 23.24, Nitan Kainth <ni...@sleepiqlabs.com> wrote:
>>
>> Hi,
>>
>> We are planning to update linux for C* nodes version 3.0. Anybody has
steps who did it recent past.
>>
>> Here are draft steps, we are thinking:
>> 1. Create new node. It might have a different IP address.
>> 2. Detach mounts from existing node
>> 3. Attach mounts to new Node
>> 4. Start C*

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Hints files are not getting truncated

2017-06-27 Thread Hannu Kröger

Hi,

First of all, I don’t know why they get delivered so slowly.

However, if your gc grace seconds is the default 10 days then those hints
from May are not needed and could/should be truncated. If the hint delivery
is causing problems, then one option is that you could just disable it and
rely on periodic repair doing its job and getting the data synchronized.

Now my question to you: Do you have repairs running periodically across the
cluster and actually succeeding if the connection is flaky?

Hannu

On 27 June 2017 at 19:17:26, Meg Mara (mm...@digitalriver.com) wrote:

Hello,



I am facing an issue with Hinted Handoff files in Cassandra v3.0.10. A DC1
node is storing large number of hints for DC2 nodes (we are facing
connection timeout issues). The problem is that the hint files which are
created on DC1 are not getting deleted after the 3 hour window. Hints are
now being stored as flat files in the Cassandra home directory and I can
see that old hints are being deleted but at a very slow pace. It still
contains hints from May.

max_hint_window_in_ms: 1080

max_hints_delivery_threads: 2



Why do you suppose this is happening? Any suggestions or recommendations
would be much appreciated.



Thanks for your time.

Meg Mara

Re: Linux version update on DSE

2017-06-26 Thread Hannu Kröger

I understood he is updating linux, not C*

Hannu

On 27 June 2017 at 02:04:34, Jonathan Haddad (j...@jonhaddad.com) wrote:

It sounds like you're suggesting adding new nodes in to replace existing
ones.  You can't do that because it requires streaming between versions,
which isn't supported.

You need to take a node down, upgrade the C* version, then start it back
up.

Jon

On Mon, Jun 26, 2017 at 3:56 PM Nitan Kainth <ni...@bamlabs.com> wrote:

> It's vnodes. We will add to replace new ip in yaml as well.
>
> Thank you.
>
> Sent from my iPhone
>
> > On Jun 26, 2017, at 4:47 PM, Hannu Kröger <hkro...@gmail.com> wrote:
> >
> > Looks Ok. Step 1.5 would be to stop cassandra on existing node but apart
> from that looks fine. Assuming you are using same configs and if you have
> hard coded the token(s), you use the same.
> >
> > Hannu
> >
> >> On 26 Jun 2017, at 23.24, Nitan Kainth <ni...@sleepiqlabs.com> wrote:
> >>
> >> Hi,
> >>
> >> We are planning to update linux for C* nodes version 3.0. Anybody has
> steps who did it recent past.
> >>
> >> Here are draft steps, we are thinking:
> >> 1. Create new node. It might have a different IP address.
> >> 2. Detach mounts from existing node
> >> 3. Attach mounts to new Node
> >> 4. Start C*
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Linux version update on DSE

2017-06-26 Thread Hannu Kröger

Looks Ok. Step 1.5 would be to stop cassandra on existing node but apart from 
that looks fine. Assuming you are using same configs and if you have hard coded 
the token(s), you use the same. 

Hannu

> On 26 Jun 2017, at 23.24, Nitan Kainth  wrote:
> 
> Hi,
> 
> We are planning to update linux for C* nodes version 3.0. Anybody has steps 
> who did it recent past.
> 
> Here are draft steps, we are thinking:
> 1. Create new node. It might have a different IP address.
> 2. Detach mounts from existing node
> 3. Attach mounts to new Node
> 4. Start C*

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Incorrect quorum count in driver error logs

2017-06-26 Thread Hannu Kröger

Just to be sure: you have only one datacenter configured in Cassandra?

Hannu

> On 27 Jun 2017, at 0.02, Rutvij Bhatt  wrote:
> 
> Hi guys,
> 
> I observed some odd behaviour with our Cassandra cluster the other day while 
> doing some maintenance operation and was wondering if anyone would be able to 
> provide some insight.
> 
> Initially, I started a node up to join the cluster. That node appeared to be 
> having issues joining due to some SSTable corruption it encountered. Since it 
> was still in early staged and I had never seen this failure before, I decided 
> to take it out of commission and just try again. However, since it was in a 
> bad state, I decided to issue a "nodetool removenode " on a peer 
> rather than a "nodetool decommission" on the node itself.
> 
> The removenode command hung indefinitely - my guess is that this is related 
> to https://issues.apache.org/jira/browse/CASSANDRA-6542. We are using 2.1.11.
> 
> While this was happening, the driver in the application started logging error 
> messages about not being able to reach a quorum of 4. This, to me, was 
> mysterious as none of my keyspaces have an RF > 3. That quorum count in the 
> error implied an RF of 6 or 7.
> 
> I eventually forced that node out of the ring with "nodetool removenode 
> force". This seemed to mostly fix the issue, though there seems to have been 
> enough of a load spike to cause some of the machines' JVMs to accumulate a 
> lot of garbage very fast and spit out a ton of "Not marking nodes down due to 
> local pause of ... ", trying to clean it up. Some of these nodes seemed 
> unresponsive to their peers, who marked them DOWN (as indicated by "nodetool 
> status" and the cassandra log file on those machines), further exacerbating 
> the situation on the nodes that were still up.
> 
> I guess my question is two-fold. First, can anyone provide some insight into 
> what may have happened? Second, what do you consider good practices when 
> dealing with such issues? Any advice is greatly appreciated!
> 
> Thanks,
> Rutvij

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread Hannu Kröger

Hello,

I tried the same thing with 3.10 which I happened to have at hand and that
seems to work.

cqlsh:test> select lastname,firstname,dateofbirth from individuals where
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
  Jimmie3 |Lundin | 2000-11-18 17:55:18.00+
   Jimmie |Lundin | 2000-11-18 17:55:17.00+

(3 rows)
cqlsh:test> select lastname,firstname,dateofbirth from individuals where
dateofbirth < '2001-01-01T10:00:00+' and dateofbirth >
'2000-11-18T17:59:18+';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+

(1 rows)
cqlsh:test>

Maybe you have timezone issue?

Best Regards,
Hannu

On 19 June 2017 at 17:09:10, Tobias Eriksson (tobias.eriks...@qvantel.com)
wrote:

Hi

I have a table like this (Cassandra 3.5)

Table

id uuid,

lastname text,

firstname text,

address_id uuid,

dateofbirth timestamp,



PRIMARY KEY (id, lastname, firstname)



And a SASI index like this

create custom index indv_birth ON playground.individual(dateofbirth) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
'SPARSE'};



The data



lastname | firstname | dateofbirth

--+---+-

   Lundin |Jimmie | 2000-11-18 17:55:17.00+

  Jansson |   Karolin | 2000-12-19 17:55:17.00+

Öberg |Louisa | 2000-11-18 17:55:18.00+





Now if I do this

select lastname,firstname,dateofbirth from playground.individual where
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';



I should only get ONE row, right

lastname | firstname | dateofbirth

--+---+-

Jansson |   Karolin | 2000-12-19 17:55:17.00+





But instead I get all 3 rows !!!



Why is that ?



-Tobias

Re: Node replacement strategy with AWS EBS

2017-06-14 Thread Hannu Kröger

Hi,

So, if it works, great.

auto_bootstrap false is not needed when you have system keyspace as also
mentioned in the article. Now you are likely to have different tokens then
the previous node (unless those were manually configured to match the old
node) and repair and cleanup are needed to get that node to “right” state.
But if the tokens were configured ok then repair & cleanup are not needed.

Cheers,
Hannu

On 14 June 2017 at 13:37:29, Rutvij Bhatt (rut...@sense.com) wrote:

Thanks again for your help! To summarize for anyone who stumbles onto this
in the future, this article covers the procedure well:
https://www.eventbrite.com/engineering/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

It is more or less what Hannu suggested.

I carried out the following steps:
1. safely stop the cassandra instance (nodetool drain + service cassandra
stop)
2. Shut down the ec2 instance.
3. detach the storage volume from old instance.
4. attach to new instance.
5. point cassandra configuration on new instance to this drive and set
auto_bootstrap: false
6. start cassandra on new instance. Once it has established connection with
peers, you will notice that it takes over the token ranges on its own.
Doing a select on the system.peers table will show that the old node is
gone.
7. Run nodetool repair if need be.

On Tue, Jun 13, 2017 at 1:01 PM Rutvij Bhatt <rut...@sense.com> wrote:

> Nevermind, I misunderstood the first link. In this case, the replacement
> would just be leaving the listen_address as is (to
> InetAddress.getLocalHost()) and just start the new instance up as you
> pointed out in your original answer Hannu.
>
> Thanks.
>
> On Tue, Jun 13, 2017 at 12:35 PM Rutvij Bhatt <rut...@sense.com> wrote:
>
>> Hannu/Nitan,
>>
>> Thanks for your help so far! From what you said in your first response, I
>> can get away with just attaching the EBS volume to Cassandra and starting
>> it with the old node's private IP as my listen_address because it will take
>> over the token assignment from the old node using the data files? With
>> regards to "Cassandra automatically realizes that have just effectively
>> changed IP address.", it says in the first link to change this manually to
>> the desired address - does this not apply in my case if I'm replacing the
>> old node?
>>
>> As for the plan I outlined earlier, is this more for DR scenarios where I
>> have lost a node due to hardware failure and I need to recover the data in
>> a safe manner by requesting a stream from the other replicas?  Am I
>> understanding this right?
>>
>>
>> On Tue, Jun 13, 2017 at 11:59 AM Hannu Kröger <hkro...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> So the local information about tokens is stored in the system keyspace.
>>> Also the host id and all that.
>>>
>>> Also documented here:
>>>
>>> https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE
>>>
>>> If for any reason that causes issues, you can also check this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-8382
>>>
>>> If you copy all cassandra data, you are on the safe side. Good point in
>>> the links is that if you have IP addresses in topolgy or other files, then
>>> update those as well.
>>>
>>> Hannu
>>>
>>> On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:
>>>
>>> Hannu,
>>>
>>> "Cassandra automatically realizes that have just effectively changed IP
>>> address” —> are you sure C* will take care of IP change as is? How will it
>>> know which token range to be assigned to this new IP address?
>>>
>>> On Jun 13, 2017, at 10:51 AM, Hannu Kröger <hkro...@gmail.com> wrote:
>>>
>>> Cassandra automatically realizes that have just effectively changed IP
>>> address
>>>
>>>
>>>

Re: Node replacement strategy with AWS EBS

2017-06-13 Thread Hannu Kröger

Hello,

So the local information about tokens is stored in the system keyspace.
Also the host id and all that.

Also documented here:
https://support.datastax.com/hc/en-us/articles/204289959-Changing-IP-addresses-in-DSE

If for any reason that causes issues, you can also check this:
https://issues.apache.org/jira/browse/CASSANDRA-8382

If you copy all cassandra data, you are on the safe side. Good point in the
links is that if you have IP addresses in topolgy or other files, then
update those as well.

Hannu

On 13 June 2017 at 11:53:13, Nitan Kainth (ni...@bamlabs.com) wrote:

Hannu,

"Cassandra automatically realizes that have just effectively changed IP
address” —> are you sure C* will take care of IP change as is? How will it
know which token range to be assigned to this new IP address?

On Jun 13, 2017, at 10:51 AM, Hannu Kröger <hkro...@gmail.com> wrote:

Cassandra automatically realizes that have just effectively changed IP
address

Re: Node replacement strategy with AWS EBS

2017-06-13 Thread Hannu Kröger

Hello,

I think that’s not the optimal way to handle it.

If you are just attaching the same EBS volume to a new node you can do like
this:
1) nodetool drain on old
2) stop cassandra on old
3) Attach EBS to new node
4) Start Cassandra on new node

Cassandra automatically realizes that have just effectively changed IP
address.

replace_address will also stream all the data, so that’s inefficient way to
do it if you already have all the data.

Hannu

On 13 June 2017 at 11:23:56, Rutvij Bhatt (rut...@sense.com) wrote:

Hi!

We're running a Cassandra cluster on AWS. I want to replace an old node
with EBS storage with a new one. The steps I'm following are as follows and
I want to get a second opinion on whether this is the right thing to do:

1. Remove old node from gossip.
2. Run nodetool drain
3. Stop cassandra
4. Create new new node and update JVM_OPTS in cassandra-env.sh with
cassandra.replace_address= as instructed
here -
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
5. Attach the EBS volume from the old node at the same mount point.
6. Start cassandra on the new node.
7. Run nodetool repair to catch the replacing node up on whatever it has
missed.

Thanks!

Re: Is DataStax's DSE better than cassandra's free open source for a newbie developer's good start for cassandra?

2017-05-30 Thread Hannu Kröger

Hello,

DSE is commercial and costs money to use in production. More info from DataStax:
http://www.datastax.com/products/subscriptions 


RPMs are currently not available for the latest version. There is 3.0.13 but 
newer than that are not available from Apache AFAIK:
http://cassandra.apache.org/download/ 

Build instructions are here:
https://github.com/apache/cassandra/tree/trunk/redhat 


I hope those are helpful!

Cheers,
Hannu


> On 30 May 2017, at 13:30, gloCalHelp.com  wrote:
> 
> Dear sir,
> 
> Good evening, this is Georgelin from the biggest market of ShangHai, China,
> 
> I have known how to download an odd-number(bug fixed) version cassandra 
> source but not a rpm package.
> 
> would you like to give me a step by step guiding from  compiling, 
> distributing compiled classes to several computers to setup cassandra 
> cluster?   Cause I am starting to focus on cassandra big data but not other 
> big  data such as Greenplum, CGE, HDP etc. 
> 
> And I am a quick learner and deserving teaching student( I have ever worked 
> in IBM too) in big market of China, is there a senior Cassandra's system 
> designer who  would like to be my Cassandra's teacher?
> 
> And this is a starting point question,  is it better for a newbie to download 
> a third party cassandra source such as DSE to dig in  or download original 
> Cassandra source? Because DSE has spark, solr, graph DB more functions than 
> original cassandra,
> but is there anyone know DSE's license?
> 
> This is my personal mobile phone: 
> 0086 180 5004 2436
> , except cassandra, any big data problems on MPP Greenplum and CGE are 
> welcomed to ask me, we can exchange big data systems' knowledge too.
> 
> Sincerely yours,
> Georgelin
> www_8ems_...@sina.com
> mobile:0086 180 5004 2436
>

Weirdest problem on this mailing list

2017-05-22 Thread Hannu Kröger

Hello,

For some reason the emails I sent to this Cassandra email list end up to PayPal 
support email. Can some list admin check if there is something weird in the 
list configuration or if some funny person added PayPal support address to 
mailing list?

Cheers,
Hannu
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra Server 3.10 unable to Start after crash - commitlog needs to be removed

2017-05-19 Thread Hannu Kröger

I have seen this happen as well. Deleting commit logs helps to Cassandra start 
but of course if you are very unlucky you might lose some data. 

Hannu

> On 19 May 2017, at 18.13, Haris Altaf  wrote:
> 
> Hi All,
> I am using Cassandra 3.10 for my project and whenever my local windows 
> system, which is my development environment, crashes then cassandra server is 
> unable to start. I have to delete commitlog directory after every system 
> crash. This is actually annoying and what's the purpose of commitlog if it 
> itself gets crashed. I have uploaded the entire dump of Cassandra Server 
> (along with logs, commitlogs, data, configs etc) at the link below. Kindly 
> share its solution. I believe it needs to be fixed.
> 
> Crashed Cassandra 3.10 Server Link: 
> https://drive.google.com/open?id=0BxE52j6oo6cEYXJvdGhBNHNQd0E
> 
> regards,
> Haris
> -- 
> regards,
> Haris

Re: sstablesplit - status

2017-05-17 Thread Hannu Kröger

Basically meaning that if you run major compaction (=nodetool compact), you 
will end up with even bigger file and that is likely to never get compacted 
without running major compaction again. And therefore not recommended for 
production system.

Hannu

> On 17 May 2017, at 19:46, Nitan Kainth  wrote:
> 
> You can try running major compaction to get rid of duplicate data and deleted 
> data. But will be the routine for future.
> 
>> On May 17, 2017, at 10:23 AM, Jan Kesten > > wrote:
>> 
>> me patt
>

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

Yes, I agree. I would say it cannot skip those cells because it doesn’t check 
the max timestamp of the cells of the sstable and therefore scans them one by 
one.

Hannu
 
> On 16 May 2017, at 19:48, Stefano Ortolani <ostef...@gmail.com> wrote:
> 
> But it should skip those records since they are sorted. My understanding 
> would be something like:
> 
> 1) read sstable 2
> 2) read the range tombstone
> 3) skip records from sstable2 and sstable1 within the range boundaries
> 4) read remaining records from sstable1
> 5) no records, return
> 
> On Tue, May 16, 2017 at 5:43 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> This is a bit of guessing but it probably reads sstables in some sort of 
> sequence, so even if sstable 2 contains the tombstone, it still scans through 
> the sstable 1 for possible data to be read.
> 
> BR,
> Hannu
> 
>> On 16 May 2017, at 19:40, Stefano Ortolani <ostef...@gmail.com 
>> <mailto:ostef...@gmail.com>> wrote:
>> 
>> Little update: also the following query timeouts, which is weird since the 
>> range tombstone should have been read by then...
>> 
>> SELECT * 
>> FROM test_cql.test_cf 
>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>> AND timeid < the_oldest_deleted_timeid
>> ORDER BY timeid DESC;
>> 
>> 
>> 
>> On Tue, May 16, 2017 at 5:17 PM, Stefano Ortolani <ostef...@gmail.com 
>> <mailto:ostef...@gmail.com>> wrote:
>> Yes, that was my intention but I wanted to cross-check with the ML and the 
>> devs keeping an eye on it first.
>> 
>> On Tue, May 16, 2017 at 5:10 PM, Hannu Kröger <hkro...@gmail.com 
>> <mailto:hkro...@gmail.com>> wrote:
>> Well,
>> 
>> sstables contain some statistics about the cell timestamps and using that 
>> information and the tombstone timestamp it might be possible to skip some 
>> data but I’m not sure that Cassandra currently does that. Maybe it would be 
>> worth a JIRA ticket and see what the devs think about it. If optimizing this 
>> case would make sense.
>> 
>> Hannu
>> 
>>> On 16 May 2017, at 18:03, Stefano Ortolani <ostef...@gmail.com 
>>> <mailto:ostef...@gmail.com>> wrote:
>>> 
>>> Hi Hannu,
>>> 
>>> the piece of data in question is older. In my example the tombstone is the 
>>> newest piece of data.
>>> Since a range tombstone has information re the clustering key ranges, and 
>>> the data is clustering key sorted, I would expect a linear scan not to be 
>>> necessary.
>>> 
>>> On Tue, May 16, 2017 at 3:46 PM, Hannu Kröger <hkro...@gmail.com 
>>> <mailto:hkro...@gmail.com>> wrote:
>>> Well, as mentioned, probably Cassandra doesn’t have logic and data to skip 
>>> bigger regions of deleted data based on range tombstone. If some piece of 
>>> data in a partition is newer than the tombstone, then it cannot be skipped. 
>>> Therefore some partition level statistics of cell ages would need to be 
>>> kept in the column index for the skipping and that is probably not there.
>>> 
>>> Hannu 
>>> 
>>>> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com 
>>>> <mailto:ostef...@gmail.com>> wrote:
>>>> 
>>>> That is another way to see the question: are reverse iterators range 
>>>> tombstone aware? Yes.
>>>> That is why I am puzzled by this afore-mentioned behavior. 
>>>> I would expect them to handle this case more gracefully.
>>>> 
>>>> Cheers,
>>>> Stefano
>>>> 
>>>> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com 
>>>> <mailto:ni...@bamlabs.com>> wrote:
>>>> Hannu,
>>>> 
>>>> How can you read a partition in reverse?
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com 
>>>> > <mailto:hkro...@gmail.com>> wrote:
>>>> >
>>>> > Well, I’m guessing that Cassandra doesn't really know if the range 
>>>> > tombstone is useful for this or not.
>>>> >
>>>> > In many cases it might be that the partition contains data that is 
>>>> > within the range of the tombstone but is newer than the tombstone and 
>>>> > therefore it might be still be returned. Scanning through deleted data 
>>>> > can be avoided by reading th

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

This is a bit of guessing but it probably reads sstables in some sort of 
sequence, so even if sstable 2 contains the tombstone, it still scans through 
the sstable 1 for possible data to be read.

BR,
Hannu

> On 16 May 2017, at 19:40, Stefano Ortolani <ostef...@gmail.com> wrote:
> 
> Little update: also the following query timeouts, which is weird since the 
> range tombstone should have been read by then...
> 
> SELECT * 
> FROM test_cql.test_cf 
> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
> AND timeid < the_oldest_deleted_timeid
> ORDER BY timeid DESC;
> 
> 
> 
> On Tue, May 16, 2017 at 5:17 PM, Stefano Ortolani <ostef...@gmail.com 
> <mailto:ostef...@gmail.com>> wrote:
> Yes, that was my intention but I wanted to cross-check with the ML and the 
> devs keeping an eye on it first.
> 
> On Tue, May 16, 2017 at 5:10 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Well,
> 
> sstables contain some statistics about the cell timestamps and using that 
> information and the tombstone timestamp it might be possible to skip some 
> data but I’m not sure that Cassandra currently does that. Maybe it would be 
> worth a JIRA ticket and see what the devs think about it. If optimizing this 
> case would make sense.
> 
> Hannu
> 
>> On 16 May 2017, at 18:03, Stefano Ortolani <ostef...@gmail.com 
>> <mailto:ostef...@gmail.com>> wrote:
>> 
>> Hi Hannu,
>> 
>> the piece of data in question is older. In my example the tombstone is the 
>> newest piece of data.
>> Since a range tombstone has information re the clustering key ranges, and 
>> the data is clustering key sorted, I would expect a linear scan not to be 
>> necessary.
>> 
>> On Tue, May 16, 2017 at 3:46 PM, Hannu Kröger <hkro...@gmail.com 
>> <mailto:hkro...@gmail.com>> wrote:
>> Well, as mentioned, probably Cassandra doesn’t have logic and data to skip 
>> bigger regions of deleted data based on range tombstone. If some piece of 
>> data in a partition is newer than the tombstone, then it cannot be skipped. 
>> Therefore some partition level statistics of cell ages would need to be kept 
>> in the column index for the skipping and that is probably not there.
>> 
>> Hannu 
>> 
>>> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com 
>>> <mailto:ostef...@gmail.com>> wrote:
>>> 
>>> That is another way to see the question: are reverse iterators range 
>>> tombstone aware? Yes.
>>> That is why I am puzzled by this afore-mentioned behavior. 
>>> I would expect them to handle this case more gracefully.
>>> 
>>> Cheers,
>>> Stefano
>>> 
>>> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com 
>>> <mailto:ni...@bamlabs.com>> wrote:
>>> Hannu,
>>> 
>>> How can you read a partition in reverse?
>>> 
>>> Sent from my iPhone
>>> 
>>> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com 
>>> > <mailto:hkro...@gmail.com>> wrote:
>>> >
>>> > Well, I’m guessing that Cassandra doesn't really know if the range 
>>> > tombstone is useful for this or not.
>>> >
>>> > In many cases it might be that the partition contains data that is within 
>>> > the range of the tombstone but is newer than the tombstone and therefore 
>>> > it might be still be returned. Scanning through deleted data can be 
>>> > avoided by reading the partition in reverse (if all the deleted data is 
>>> > in the beginning of the partition). Eventually you will still end up 
>>> > reading a lot of tombstones but you will get a lot of live data first and 
>>> > the implicit query limit of 1 probably is reached before you get to 
>>> > the tombstones. Therefore you will get an immediate answer.
>>> >
>>> > Does it make sense?
>>> >
>>> > Hannu
>>> >
>>> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com 
>>> >> <mailto:ostef...@gmail.com>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I am seeing inconsistencies when mixing range tombstones, wide 
>>> >> partitions, and reverse iterators.
>>> >> I still have to understand if the behaviour is to be expected hence the 
>>> >> message on the mailing list.
>>> >>
>>> >> The situation is conceptually

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

Well,

sstables contain some statistics about the cell timestamps and using that 
information and the tombstone timestamp it might be possible to skip some data 
but I’m not sure that Cassandra currently does that. Maybe it would be worth a 
JIRA ticket and see what the devs think about it. If optimizing this case would 
make sense.

Hannu

> On 16 May 2017, at 18:03, Stefano Ortolani <ostef...@gmail.com> wrote:
> 
> Hi Hannu,
> 
> the piece of data in question is older. In my example the tombstone is the 
> newest piece of data.
> Since a range tombstone has information re the clustering key ranges, and the 
> data is clustering key sorted, I would expect a linear scan not to be 
> necessary.
> 
> On Tue, May 16, 2017 at 3:46 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Well, as mentioned, probably Cassandra doesn’t have logic and data to skip 
> bigger regions of deleted data based on range tombstone. If some piece of 
> data in a partition is newer than the tombstone, then it cannot be skipped. 
> Therefore some partition level statistics of cell ages would need to be kept 
> in the column index for the skipping and that is probably not there.
> 
> Hannu 
> 
>> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com 
>> <mailto:ostef...@gmail.com>> wrote:
>> 
>> That is another way to see the question: are reverse iterators range 
>> tombstone aware? Yes.
>> That is why I am puzzled by this afore-mentioned behavior. 
>> I would expect them to handle this case more gracefully.
>> 
>> Cheers,
>> Stefano
>> 
>> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com 
>> <mailto:ni...@bamlabs.com>> wrote:
>> Hannu,
>> 
>> How can you read a partition in reverse?
>> 
>> Sent from my iPhone
>> 
>> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com 
>> > <mailto:hkro...@gmail.com>> wrote:
>> >
>> > Well, I’m guessing that Cassandra doesn't really know if the range 
>> > tombstone is useful for this or not.
>> >
>> > In many cases it might be that the partition contains data that is within 
>> > the range of the tombstone but is newer than the tombstone and therefore 
>> > it might be still be returned. Scanning through deleted data can be 
>> > avoided by reading the partition in reverse (if all the deleted data is in 
>> > the beginning of the partition). Eventually you will still end up reading 
>> > a lot of tombstones but you will get a lot of live data first and the 
>> > implicit query limit of 1 probably is reached before you get to the 
>> > tombstones. Therefore you will get an immediate answer.
>> >
>> > Does it make sense?
>> >
>> > Hannu
>> >
>> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com 
>> >> <mailto:ostef...@gmail.com>> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I am seeing inconsistencies when mixing range tombstones, wide 
>> >> partitions, and reverse iterators.
>> >> I still have to understand if the behaviour is to be expected hence the 
>> >> message on the mailing list.
>> >>
>> >> The situation is conceptually simple. I am using a table defined as 
>> >> follows:
>> >>
>> >> CREATE TABLE test_cql.test_cf (
>> >>  hash blob,
>> >>  timeid timeuuid,
>> >>  PRIMARY KEY (hash, timeid)
>> >> ) WITH CLUSTERING ORDER BY (timeid ASC)
>> >>  AND compaction = {'class' : 'LeveledCompactionStrategy'};
>> >>
>> >> I then proceed by loading 2/3GB from 3 sstables which I know contain a 
>> >> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest 
>> >> _half_ of that partition by executing the query below, and restart the 
>> >> node:
>> >>
>> >> DELETE
>> >> FROM test_cql.test_cf
>> >> WHERE hash = x AND timeid < y;
>> >>
>> >> If I keep compactions disabled the following query timeouts (takes more 
>> >> than 10 seconds to
>> >> succeed):
>> >>
>> >> SELECT *
>> >> FROM test_cql.test_cf
>> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
>> >> ORDER BY timeid ASC;
>> >>
>> >> While the following returns immediately (obviously because no deleted 
>> >> data is ever read):
>> >>
>> >

Re: Decommissioned node cluster shows as down

2017-05-16 Thread Hannu Kröger

That’s weird. I thought decommission would ultimately remove the node from the 
cluster because the token(s) should be removed from the ring and data should be 
streamed to new owners. “DN” is IMHO not a state where the node should end up 
in. 

Hannu

> On 16 May 2017, at 19:05, suraj pasuparthy  wrote:
> 
> Yes, you have to run a nodetool removenode to decomission completely.. this 
> will also allow another node with the same ip different HashId to join the 
> cluster..
> 
> Thanks
> -suraj
> On Tue, May 16, 2017 at 9:01 AM Mark Furlong  > wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I have a node I decommissioned on a large ring using 2.1.12. The node 
> completed the decommission process and is no longer communicating with the 
> rest of the cluster. However when I run a nodetool status on any node in the 
> cluster it shows
> 
> the node as ‘DN’. Why is this and should I just run a removenode now?
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> Mark Furlong
> 
> 
> 
> 
> 
> 
> 
> Sr. Database Administrator
> 
> 
> 
> 
> 
> 
> 
> mfurl...@ancestry.com 
> 
> 
> M: 801-859-7427
> 
> 
> 
> O: 801-705-7115
> 
> 
> 
> 1300 W Traverse Pkwy
> 
> 
> 
> Lehi, UT 84043
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

Well, as mentioned, probably Cassandra doesn’t have logic and data to skip 
bigger regions of deleted data based on range tombstone. If some piece of data 
in a partition is newer than the tombstone, then it cannot be skipped. 
Therefore some partition level statistics of cell ages would need to be kept in 
the column index for the skipping and that is probably not there.

Hannu 

> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com> wrote:
> 
> That is another way to see the question: are reverse iterators range 
> tombstone aware? Yes.
> That is why I am puzzled by this afore-mentioned behavior. 
> I would expect them to handle this case more gracefully.
> 
> Cheers,
> Stefano
> 
> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com 
> <mailto:ni...@bamlabs.com>> wrote:
> Hannu,
> 
> How can you read a partition in reverse?
> 
> Sent from my iPhone
> 
> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com 
> > <mailto:hkro...@gmail.com>> wrote:
> >
> > Well, I’m guessing that Cassandra doesn't really know if the range 
> > tombstone is useful for this or not.
> >
> > In many cases it might be that the partition contains data that is within 
> > the range of the tombstone but is newer than the tombstone and therefore it 
> > might be still be returned. Scanning through deleted data can be avoided by 
> > reading the partition in reverse (if all the deleted data is in the 
> > beginning of the partition). Eventually you will still end up reading a lot 
> > of tombstones but you will get a lot of live data first and the implicit 
> > query limit of 1 probably is reached before you get to the tombstones. 
> > Therefore you will get an immediate answer.
> >
> > Does it make sense?
> >
> > Hannu
> >
> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com 
> >> <mailto:ostef...@gmail.com>> wrote:
> >>
> >> Hi all,
> >>
> >> I am seeing inconsistencies when mixing range tombstones, wide partitions, 
> >> and reverse iterators.
> >> I still have to understand if the behaviour is to be expected hence the 
> >> message on the mailing list.
> >>
> >> The situation is conceptually simple. I am using a table defined as 
> >> follows:
> >>
> >> CREATE TABLE test_cql.test_cf (
> >>  hash blob,
> >>  timeid timeuuid,
> >>  PRIMARY KEY (hash, timeid)
> >> ) WITH CLUSTERING ORDER BY (timeid ASC)
> >>  AND compaction = {'class' : 'LeveledCompactionStrategy'};
> >>
> >> I then proceed by loading 2/3GB from 3 sstables which I know contain a 
> >> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest 
> >> _half_ of that partition by executing the query below, and restart the 
> >> node:
> >>
> >> DELETE
> >> FROM test_cql.test_cf
> >> WHERE hash = x AND timeid < y;
> >>
> >> If I keep compactions disabled the following query timeouts (takes more 
> >> than 10 seconds to
> >> succeed):
> >>
> >> SELECT *
> >> FROM test_cql.test_cf
> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
> >> ORDER BY timeid ASC;
> >>
> >> While the following returns immediately (obviously because no deleted data 
> >> is ever read):
> >>
> >> SELECT *
> >> FROM test_cql.test_cf
> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
> >> ORDER BY timeid DESC;
> >>
> >> If I force a compaction the problem is gone, but I presume just because 
> >> the data is rearranged.
> >>
> >> It seems to me that reading by ASC does not make use of the range 
> >> tombstone until C* reads the
> >> last sstables (which actually contains the range tombstone and is flushed 
> >> at node restart), and it wastes time reading all rows that are actually 
> >> not live anymore.
> >>
> >> Is this expected? Should the range tombstone actually help in these cases?
> >>
> >> Thanks a lot!
> >> Stefano
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> > <mailto:user-unsubscr...@cassandra.apache.org>
> > For additional commands, e-mail: user-h...@cassandra.apache.org 
> > <mailto:user-h...@cassandra.apache.org>
> >
>

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

Hello,

If you mean how to construct a query like that: you use ORDER BY clause with 
SELECT which is reverse to the default just like in the example below? If the 
table is constructed with "clustering order by (timeid ASC)” and you query 
“SELECT ... ORDER BY timeid DESC”, then the partition is read backwards. I 
don’t know how it is technically done but it is apparently slightly slower then 
reading partition normally.

Hannu 

> On 16 May 2017, at 17:29, Nitan Kainth <ni...@bamlabs.com> wrote:
> 
> Hannu,
> 
> How can you read a partition in reverse? 
> 
> Sent from my iPhone
> 
>> On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com> wrote:
>> 
>> Well, I’m guessing that Cassandra doesn't really know if the range tombstone 
>> is useful for this or not. 
>> 
>> In many cases it might be that the partition contains data that is within 
>> the range of the tombstone but is newer than the tombstone and therefore it 
>> might be still be returned. Scanning through deleted data can be avoided by 
>> reading the partition in reverse (if all the deleted data is in the 
>> beginning of the partition). Eventually you will still end up reading a lot 
>> of tombstones but you will get a lot of live data first and the implicit 
>> query limit of 1 probably is reached before you get to the tombstones. 
>> Therefore you will get an immediate answer.
>> 
>> Does it make sense?
>> 
>> Hannu
>> 
>>> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am seeing inconsistencies when mixing range tombstones, wide partitions, 
>>> and reverse iterators.
>>> I still have to understand if the behaviour is to be expected hence the 
>>> message on the mailing list.
>>> 
>>> The situation is conceptually simple. I am using a table defined as follows:
>>> 
>>> CREATE TABLE test_cql.test_cf (
>>> hash blob,
>>> timeid timeuuid,
>>> PRIMARY KEY (hash, timeid)
>>> ) WITH CLUSTERING ORDER BY (timeid ASC)
>>> AND compaction = {'class' : 'LeveledCompactionStrategy'};
>>> 
>>> I then proceed by loading 2/3GB from 3 sstables which I know contain a 
>>> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest 
>>> _half_ of that partition by executing the query below, and restart the node:
>>> 
>>> DELETE 
>>> FROM test_cql.test_cf 
>>> WHERE hash = x AND timeid < y;
>>> 
>>> If I keep compactions disabled the following query timeouts (takes more 
>>> than 10 seconds to 
>>> succeed):
>>> 
>>> SELECT * 
>>> FROM test_cql.test_cf 
>>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>>> ORDER BY timeid ASC;
>>> 
>>> While the following returns immediately (obviously because no deleted data 
>>> is ever read):
>>> 
>>> SELECT * 
>>> FROM test_cql.test_cf 
>>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>>> ORDER BY timeid DESC;
>>> 
>>> If I force a compaction the problem is gone, but I presume just because the 
>>> data is rearranged.
>>> 
>>> It seems to me that reading by ASC does not make use of the range tombstone 
>>> until C* reads the
>>> last sstables (which actually contains the range tombstone and is flushed 
>>> at node restart), and it wastes time reading all rows that are actually not 
>>> live anymore. 
>>> 
>>> Is this expected? Should the range tombstone actually help in these cases?
>>> 
>>> Thanks a lot!
>>> Stefano
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger

Well, I’m guessing that Cassandra doesn't really know if the range tombstone is 
useful for this or not. 

In many cases it might be that the partition contains data that is within the 
range of the tombstone but is newer than the tombstone and therefore it might 
be still be returned. Scanning through deleted data can be avoided by reading 
the partition in reverse (if all the deleted data is in the beginning of the 
partition). Eventually you will still end up reading a lot of tombstones but 
you will get a lot of live data first and the implicit query limit of 1 
probably is reached before you get to the tombstones. Therefore you will get an 
immediate answer.

Does it make sense?

Hannu

> On 16 May 2017, at 16:33, Stefano Ortolani  wrote:
> 
> Hi all,
> 
> I am seeing inconsistencies when mixing range tombstones, wide partitions, 
> and reverse iterators.
> I still have to understand if the behaviour is to be expected hence the 
> message on the mailing list.
> 
> The situation is conceptually simple. I am using a table defined as follows:
> 
> CREATE TABLE test_cql.test_cf (
>   hash blob,
>   timeid timeuuid,
>   PRIMARY KEY (hash, timeid)
> ) WITH CLUSTERING ORDER BY (timeid ASC)
>   AND compaction = {'class' : 'LeveledCompactionStrategy'};
> 
> I then proceed by loading 2/3GB from 3 sstables which I know contain a really 
> wide partition (> 512 MB) for `hash = x`. I then delete the oldest _half_ of 
> that partition by executing the query below, and restart the node:
> 
> DELETE 
> FROM test_cql.test_cf 
> WHERE hash = x AND timeid < y;
> 
> If I keep compactions disabled the following query timeouts (takes more than 
> 10 seconds to 
> succeed):
> 
> SELECT * 
> FROM test_cql.test_cf 
> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
> ORDER BY timeid ASC;
> 
> While the following returns immediately (obviously because no deleted data is 
> ever read):
> 
> SELECT * 
> FROM test_cql.test_cf 
> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
> ORDER BY timeid DESC;
> 
> If I force a compaction the problem is gone, but I presume just because the 
> data is rearranged.
> 
> It seems to me that reading by ASC does not make use of the range tombstone 
> until C* reads the
> last sstables (which actually contains the range tombstone and is flushed at 
> node restart), and it wastes time reading all rows that are actually not live 
> anymore. 
> 
> Is this expected? Should the range tombstone actually help in these cases?
> 
> Thanks a lot!
> Stefano


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Reg:- DSE 5.1.0 Issue

2017-05-16 Thread Hannu Kröger

Hello,

DataStax is probably more than happy answer your particaly DataStax Enterprise 
related questions here (I don’t know if that is 100% right place but…):
https://support.datastax.com/hc/en-us 

This mailing list is for open source Cassandra and DSE issues are mostly out of 
the scope here. HADOOP is one of DSE-only features.

Cheers,
Hannu

> On 16 May 2017, at 14:01, @Nandan@  wrote:
> 
> Hi ,
> Sorry in Advance if I am posting here .
> 
> I stuck in some particular steps. 
> 
> I was using DSE 4.8 on Single DC with 3 nodes. Today I upgraded my all 3 
> nodes to DSE 5.1
> Issue is when I am trying to start SERVICE DSE RESTART i am getting error 
> message as 
> 
> Hadoop functionality has been removed from DSE.
> Please try again without the HADOOP_ENABLED set in /etc/default/dse.
> 
> Even in /etc/default//dse file , HADOOP_ENABLED is set as 0 . 
> 
> For testing ,Once I changed my HADOOP_ENABLED = 1 , 
> 
> I  am getting error as 
> 
> Found multiple DSE core jar files in /usr/share/dse/lib 
> /usr/share/dse/resources/dse/lib /usr/share/dse /usr/share/dse/common . 
> Please make sure there is only one.
> 
> I searched so many article , but till now not able to find the solution. 
> Please help me to get out of this mess. 
> 
> Thanks and Best Regards,
> Nandan Priyadarshi.

Re: Exceptions when upgrade from 2.1.14 to 2.2.5

2017-04-18 Thread Hannu Kröger

Hello,

It seems that commit log is broken. One way to fix this would be to remove 
commit logs and then restart.

This will cause you to lose the writes that were in the commit log but 
hopefully the data is in other nodes.

In the future to avoid this: before you kill Cassandra, run “nodetool drain”. 
This will flush the memtables to disk and clear out the commit logs.

Cheers,
Hannu

> On 19 Apr 2017, at 4:57, Dikang Gu  wrote:
> 
> Hello there,
> 
> We are upgrading one of our cluster from 2.1.14 to 2.2.5, but cassandra had 
> problems replaying the commit logs...
> 
> Here is the exception, does anyone experience similar before?
> 
> 2017-04-19_00:22:21.69943 DEBUG 00:22:21 [main]: Finished reading 
> /data/cassandra/commitlog/CommitLog-4-1487900877734.log
> 2017-04-19_00:22:21.69960 DEBUG 00:22:21 [main]: Replaying 
> /data/cassandra/commitlog/CommitLog-4-1487900877735.log (CL version 4, 
> messaging version 8, compression null)
> 2017-04-19_00:22:22.26911 ERROR 00:22:22 [main]: Exiting due to error while 
> processing commit log during initialization.
> 2017-04-19_00:22:22.26912 
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected end of segment
> 2017-04-19_00:22:22.26912   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:623)
>  [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170
> 2017-04-19_00:22:22.26913   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:484)
>  [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170
> 2017-04-19_00:22:22.26913   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:389)
>  [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187
> 2017-04-19_00:22:22.26913   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147)
>  [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187
> 2017-04-19_00:22:22.26913   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) 
> [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187b]
> 2017-04-19_00:22:22.26913   at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) 
> [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187b]
> 2017-04-19_00:22:22.26914   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
> [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187b]
> 2017-04-19_00:22:22.26914   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:544)
>  [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187b]
> 2017-04-19_00:22:22.26914   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:607) 
> [apache-cassandra-2.2.5+git20170404.5f2187b.jar:2.2.5+git20170404.5f2187b]
> 
> Thanks
> 
> -- 
> Dikang
>

Re: Slow writes and Frequent timeouts

2017-04-17 Thread Hannu Kröger

It would help to know what kind queries are slow.

Hannu

> On 17 Apr 2017, at 18:42, Akshay Suresh  wrote:
> 
> Hi
> 
> I have set up a cassandra cluster of 8 nodes.
> 
> I am using Apache Cassandra 3.9
> 
> While using cassandra-stress tool for load testing, I am getting really slow 
> writes ( low upto few 10-20 writes per second ) along with frequent timeouts 
> and out of heap space errors.
> 
> Kindly let me know how do I resolve this issue.
> 
> 
> Disclaimer : The contents of this e-mail and attachment(s) thereto are 
> confidential and intended for the named recipient(s) only. It shall not 
> attach any liability on the originator or Unotech Software Pvt. Ltd. or its 
> affiliates. Any views or opinions presented in this email are solely those of 
> the author and may not necessarily reflect the opinions of Unotech Software 
> Pvt. Ltd. or its affiliates. Any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and / or publication of this 
> message without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please delete 
> it and notify the sender immediately.

Re: Making a Cassandra node cluster unique

2017-04-05 Thread Hannu Kröger

Hi,

Cluster name should be unique because with misconfiguration you might make
the nodes connect to either of the cluster and then you will have nodes is
wrong clusters.

Theoretically it can work with same names as well but to be on the safe
side, make the cluster names unique.

Hannu
On Wed, 5 Apr 2017 at 8.36, William Boutin 
wrote:

> Someone on my team asked me a question that I could not find an easy
> answer and I was hoping someone could answer for me.
>
> When we configure Cassandra, we use the Cluster Name, Data Center, and
> Rack to define the group of Cassandra nodes involved in holding our
> keyspace records. If a second set of nodes had the same Cluster Name, Data
> Center, and Rack values, is there a chance that CRUD actions directed at
> the first cluster of nodes could somehow end up at the second cluster of
> nodes?
>
> Thank you in advance.
>
>
>
>
>
> [image: Ericsson] 
>
> *WILLIAM L. BOUTIN *
> Engineer IV - Sftwr
> BMDA PADB DSE DU CC NGEE
>
>
> *Ericsson*
> 1 Ericsson Drive, US PI06 1.S747
> Piscataway, NJ, 08854, USA
> Phone (913) 241-5574
> Mobile (732) 213-1368
> Emergency (732) 354-1263
> william.bou...@ericsson.com
> www.ericsson.com
>
> [image: http://www.ericsson.com/current_campaign]
> 
>
> Legal entity: EUS - ERICSSON INC., registered office in US PI01 4A242.
> This Communication is Confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_disclaimer
>
>
>

Re: Can I do point in time recover using nodetool

2017-03-08 Thread Hannu Kröger

Yes,

It's possible. I haven't seen good instructions online though. The
Cassandra docs are quite bad as well.

I think I asked about it in this list and therefore I suggest you check the
mailing list archive as Mr. Roth suggested.

Hannu
On Wed, 8 Mar 2017 at 10.50, benjamin roth  wrote:

> I remember a very similar question on the list some months ago.
> The short answer is that there is no short answer. I'd recommend you
> search the mailing list archive for "backup" or "recover".
>
> 2017-03-08 10:17 GMT+01:00 Bhardwaj, Rahul :
>
> Hi All,
>
>
>
> Is there any possibility of restoring cassandra snapshots to point in time
> without using opscenter ?
>
>
>
>
>
>
>
>
>
> *Thanks and Regards*
>
> *Rahul Bhardwaj*
>
>
>
>
>

Re: Any way to control/limit off-heap memory?

2017-03-05 Thread Hannu Kröger

If bloom filters are taking too much memory, you can adjust bloom filters:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_tuning_bloom_filters_c.html
 


Hannu

> On 4 Mar 2017, at 22:54, Thakrar, Jayesh  wrote:
> 
> I have a situation where the off-heap memory is bloating the jvm process 
> memory, making it a candidate to be killed by the oom_killer.
> My server has 256 GB RAM and Cassandra heap memory of 16 GB
>  
> Below is the output of "nodetool info" and nodetool compactionstats for a 
> culprit table which causes bloom filter bloat.
> Ofcourse one option is to turnoff bloom filter, but I need to look into 
> application access pattern, etc.
>  
>  
> xss =  -ea -Dorg.xerial.snappy.tempdir=/home/vchadoop/var/tmp 
> -javaagent:/home/vchadoop/apps/apache-cassandra-2.2.5//lib/jamm-0.3.0.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms16G -Xmx16G 
> -Xmn4800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
> ID : 2b9b4252-0760-49c1-8d14-544be0183271
> Gossip active  : true
> Thrift active  : false
> Native Transport active: true
> Load   : 953.19 GB
> Generation No  : 1488641545
> Uptime (seconds)   : 15706
> Heap Memory (MB)   : 7692.93 / 16309.00
> Off Heap Memory (MB)   : 175115.07
> Data Center: ord
> Rack   : rack3
> Exceptions : 0
> Key Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 14400 save period in seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Token  : (invoke with -T/--tokens to see all 256 tokens)
>  
>  
> Table: logs_by_user
> SSTable count: 622
> SSTables in each level: [174/4, 447/10, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 313156769247
> Space used (total): 313156769247
> Space used by snapshots (total): 0
> Off heap memory used (total): 180354511884
> SSTable Compression Ratio: 0.25016314078395613
> Number of keys (estimate): 147261312
> Memtable cell count: 44796
> Memtable data size: 57578717
> Memtable off heap memory used: 0
> Memtable switch count: 21
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 1148687
> Local write latency: 0.123 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 180269125192
> Bloom filter off heap memory used: 180269120216
> Index summary off heap memory used: 24335340
> Compression metadata off heap memory used: 61056328
> Compacted partition minimum bytes: 150
> Compacted partition maximum bytes: 668489532
> Compacted partition mean bytes: 3539
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
>  
>  
> From: Conversant mailto:jthak...@conversantmedia.com>

Current data density limits with Open Source Cassandra

2017-02-08 Thread Hannu Kröger

Hello,

Back in the day it was recommended that max disk density per node for Cassandra 
1.2 was at around 3-5TB of uncompressed data. 

IIRC it was mostly because of heap memory limitations? Now that off-heap 
support is there for certain data and 3.x has different data storage format, is 
that 3-5TB still a valid limit?

Does anyone have experience on running Cassandra with 3-5TB compressed data ?

Cheers,
Hannu

Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Hannu Kröger

One possible reason is that cassandra process gets different user when run 
differently. Check who owns the data files and check also what gets written 
into the /var/log/cassandra/system.log (or whatever that was).

Hannu

> On 11 Jan 2017, at 16.42, Ajay Garg  wrote:
> 
> Tried everything.
> Every other cron job/script I try works, just the cassandra-service does not.
> 
> On Wed, Jan 11, 2017 at 8:51 AM, Edward Capriolo  > wrote:
> 
> 
> On Tuesday, January 10, 2017, Jonathan Haddad  > wrote:
> Last I checked, cron doesn't load the same, full environment you see when you 
> log in. Also, why put Cassandra on a cron?
> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal > wrote:
> Hi Ajay,
> 
> Have you had a look at cron logs? - mine is in path /var/log/cron
> 
> Thanks & Regards,
> 
> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg > wrote:
> Hi All.
> 
> Facing a very weird issue, wherein the command
> 
> /etc/init.d/cassandra start
> 
> causes cassandra to start when the command is run from command-line.
> 
> 
> However, if I put the above as a cron job
> 
> * * * * * /etc/init.d/cassandra start
> 
> cassandra never starts.
> 
> 
> I have checked, and "cron" service is running.
> 
> 
> Any ideas what might be wrong?
> I am pasting the cassandra script for brevity.
> 
> 
> Thanks and Regards,
> Ajay
> 
> 
> 
> #! /bin/sh
> ### BEGIN INIT INFO
> # Provides:  cassandra
> # Required-Start:$remote_fs $network $named $time
> # Required-Stop: $remote_fs $network $named $time
> # Should-Start:  ntp mdadm
> # Should-Stop:   ntp mdadm
> # Default-Start: 2 3 4 5
> # Default-Stop:  0 1 6
> # Short-Description: distributed storage system for structured data
> # Description:   Cassandra is a distributed (peer-to-peer) system for
> #the management and storage of structured data.
> ### END INIT INFO
> 
> # Author: Eric Evans >
> 
> DESC="Cassandra"
> NAME=cassandra
> PIDFILE=/var/run/$NAME/$NAME.pid
> SCRIPTNAME=/etc/init.d/$NAME
> CONFDIR=/etc/cassandra
> WAIT_FOR_START=10
> CASSANDRA_HOME=/usr/share/cassandra
> FD_LIMIT=10
> 
> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
> 
> # Read configuration variable file if it is present
> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
> 
> # Read Cassandra environment file.
> . /etc/cassandra/cassandra-env.sh
> 
> if [ -z "$JVM_OPTS" ]; then
> echo "Initialization failed; \$JVM_OPTS not set!" >&2
> exit 3
> fi
> 
> export JVM_OPTS
> 
> # Export JAVA_HOME, if set.
> [ -n "$JAVA_HOME" ] && export JAVA_HOME
> 
> # Load the VERBOSE setting and other rcS variables
> . /lib/init/vars.sh
> 
> # Define LSB log_* functions.
> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
> . /lib/lsb/init-functions
> 
> #
> # Function that returns 0 if process is running, or nonzero if not.
> #
> # The nonzero value is 3 if the process is simply not running, and 1 if the
> # process is not running but the pidfile exists (to match the exit codes for
> # the "status" command; see LSB core spec 3.1, section 20.2)
> #
> CMD_PATT="cassandra.+CassandraDaemon"
> is_running()
> {
> if [ -f $PIDFILE ]; then
> pid=`cat $PIDFILE`
> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
> return 1
> fi
> return 3
> }
> #
> # Function that starts the daemon/service
> #
> do_start()
> {
> # Return
> #   0 if daemon has been started
> #   1 if daemon was already running
> #   2 if daemon could not be started
> 
> ulimit -l unlimited
> ulimit -n "$FD_LIMIT"
> 
> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
> 
> [ -e `dirname "$PIDFILE"` ] || \
> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
> 
> 
> 
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p "$PIDFILE" 
> -t >/dev/null || return 1
> 
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p "$PIDFILE" 
> -- \
> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null || 
> return 2
> 
> }
> 
> #
> # Function that stops the daemon/service
> #
> do_stop()
> {
> # Return
> #   0 if daemon has been stopped
> #   1 if daemon was already stopped
> #   2 if daemon could not be stopped
> #   other if a failure occurred
> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
> RET=$?
> rm -f "$PIDFILE"
> return $RET
> }
> 
> case "$1" in
>   start)
>

Re: Is this normal!?

2017-01-11 Thread Hannu Kröger

Just to understand:

What exactly is the problem?

Cheers,
Hannu

> On 11 Jan 2017, at 16.07, Cogumelos Maravilha  
> wrote:
> 
> Cassandra 3.9.
> 
> nodetool status
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host
> ID   Rack
> UN  10.0.120.145  1.21 MiB   256  49.5%
> da6683cd-c3cf-4c14-b3cc-e7af4080c24f  rack1
> UN  10.0.120.179  1020.51 KiB  256  48.1%
> fb695bea-d5e8-4bde-99db-9f756456a035  rack1
> UN  10.0.120.55   1.02 MiB   256  53.3%
> eb911989-3555-4aef-b11c-4a684a89a8c4  rack1
> UN  10.0.120.46   1.01 MiB   256  49.1%
> 8034c30a-c1bc-44d4-bf84-36742e0ec21c  rack1
> 
> nodetool repair
> [2017-01-11 13:58:27,274] Replication factor is 1. No repair is needed
> for keyspace 'system_auth'
> [2017-01-11 13:58:27,284] Starting repair command #4, repairing keyspace
> system_traces with repair options (parallelism: parallel, primary range:
> false, incremental: true, job threads: 1, ColumnFamilies: [],
> dataCenters: [], hosts: [], # of ranges: 515)
> [2017-01-11 14:01:55,628] Repair session
> 82a25960-d806-11e6-8ac4-73b93fe4986d for range
> [(-1278992819359672027,-1209509957304098060],
> (-2593749995021251600,-2592266543457887959],
> (-6451044457481580778,-6438233936014720969],
> (-1917989291840804877,-1912580903456869648],
> (-3693090304802198257,-3681923561719364766],
> (-380426998894740867,-350094836653869552],
> (1890591246410309420,1899294587910578387],
> (6561031217224224632,6580230317350171440],
> ... 4 pages of data
> , (6033828815719998292,6079920177089043443]] finished (progress: 1%)
> [2017-01-11 13:58:27,986] Repair completed successfully
> [2017-01-11 13:58:27,988] Repair command #4 finished in 0 seconds
> 
> nodetool gcstats
> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
> (ms)   GC Reclaimed (MB) Collections  Direct Memory Bytes
>360134  23 
> 23   0   333975216  
> 1   -1
> 
> (wait)
> nodetool gcstats
> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
> (ms)   GC Reclaimed (MB) Collections  Direct Memory Bytes
>   60016   0   0
> NaN   0   0   -1
> 
> nodetool repair
> [2017-01-11 14:00:45,888] Replication factor is 1. No repair is needed
> for keyspace 'system_auth'
> [2017-01-11 14:00:45,896] Starting repair command #5, repairing keyspace
> system_traces with repair options (parallelism: parallel, primary range:
> false, incremental: true, job threads: 1, ColumnFamilies: [],
> dataCenters: [], hosts: [], # of ranges: 515)
> ... 4 pages of data
> , (94613607632078948,219237792837906432],
> (6033828815719998292,6079920177089043443]] finished (progress: 1%)
> [2017-01-11 14:00:46,567] Repair completed successfully
> [2017-01-11 14:00:46,576] Repair command #5 finished in 0 seconds
> 
> nodetool gcstats
> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
> (ms)   GC Reclaimed (MB) Collections  Direct Memory Bytes
>   9169  25  25  
> 0   330518688   1   -1
> 
> 
> Always in loop, I think!
> 
> Thanks in advance.
>

Point in time restore

2017-01-10 Thread Hannu Kröger

Hello,

Are there any guides how to do a point-in-time restore for Cassandra?

All I have seen is this:
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configLogArchive_t.html
 


That gives an idea how to store the data for restore but how to do an actual 
restore is still a mystery to me.

Any pointers?

Cheers,
Hannu

Re: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread Hannu Kröger

Looks like the graph is wrong.

Hannu

> On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL  
> wrote:
> 
> Hello everyone,
>  
> I don’t know if you have access to DataStax documentation. I don’t understand 
> the example about Multiple datacenter write requests 
> (http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
>  
> ).
>  The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum 
> computation rule 
> (http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level
>  
> )
>  
> quorum = (sum_of_replication_factors / 2) + 1
> 
> sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . + 
> datacentern_RF
>  
> If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 = 
> (6/2) + 1 = 3 + 1 = 4
>  
> Am I missing something ?
>  
> Thanks for your response.
>  
> Regards,
>  
> 
> 
> L'intégrité de ce message n'étant pas assurée sur internet, la société 
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
> n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
> l'expéditeur.
> 
> The integrity of this message cannot be guaranteed on the Internet. The 
> company that sent this message cannot therefore be held liable for its 
> content nor attachments. Any unauthorized use or dissemination is prohibited. 
> If you are not the intended recipient of this message, then please delete it 
> and notify the sender.



smime.p7s
Description: S/MIME cryptographic signature

Re: Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Hannu Kröger

Also in they are being read before compaction:
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html 


Hannu

> On 8 Nov 2016, at 16.36, DuyHai Doan  wrote:
> 
> "Does TTL also cause tombstones?" --> Yes, after the TTL expires, at the next 
> compaction the TTLed column is replaced by a tombstone, as per my 
> understanding
> 
> On Tue, Nov 8, 2016 at 3:32 PM, Ali Akhtar  > wrote:
> Does TTL also cause tombstones?
> 
> On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin  > wrote:
> >The deletes will be done at a scheduled time, probably at the end of the 
> >day, each day.
> 
> Probably you can use TTL? 
> http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html 
> 
> 
> Best regards, Vladimir Yudovin, 
> Winguzone  - Hosted Cloud Cassandra
> Launch your cluster in minutes.
> 
> 
>  On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar  > wrote 
> 
> I have a use case where a lot of updates and deletes to a table will be 
> necessary.
> 
> The deletes will be done at a scheduled time, probably at the end of the day, 
> each day.
> 
> Updates will be done throughout the day, as new data comes in.
> 
> Are there any guidelines on improving cassandra's performance for this use 
> case? Any caveats to be aware of? Any tips, like running nodetool repair 
> every X days?
> 
> Thanks.
> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature

Re: Transparent Fail-Over for Java Driver to survive Cluster Rolling

2016-10-24 Thread Hannu Kröger

Hi,

QUORUM would go through if only one node is down any given time.

Depending on the consistency requirements of your application, you could also 
use ONE (or LOCAL_ONE) as well for sensor reading storage. That would go 
through even if two nodes are down any given time.

BR,
Hannu

> On 24 Oct 2016, at 15:27, Andreas Fritzler <andreas.fritz...@gmail.com> wrote:
> 
> Thanks a lot Hannu! 
> 
> How about the write path though. What consistency level would I choose if I 
> want to insert e.g. sensor data into my cluster without the app crashing 
> every time I up update the cluster? I would assume that QUORUM (replication 
> of 3) might not always go through?
> 
> 
> 
> On Mon, Oct 24, 2016 at 2:09 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Hi,
> 
> Once the client is connected, it will automatically connect to many nodes in 
> the cluster. Therefore once the app is running the amount of contact points 
> doesn’t matter and if you have consistency level < ALL (or QUORUM where 
> replication factor is <= 2), your app should tolerate rolling restart ok.
> 
> That being said, you should have more than one contact point because if you 
> restart your application and the node indicated in contact point happens to 
> be down, the application cannot connect to cluster and fails. Having two 
> contact points is a good start.
> 
> Cheers,
> Hannu
> 
> 
>> On 24 Oct 2016, at 15:04, Andreas Fritzler <andreas.fritz...@gmail.com 
>> <mailto:andreas.fritz...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I was wondering if it is enough to set a list of contact points via:
>> 
>> Cluster.builder().addContactPoint("host1").addContactPoint("host2")...;
>> to survive a cluster rolling while inserting/reading from the cluster.
>> 
>> Regards,
>> Andreas
> 
>

Re: Transparent Fail-Over for Java Driver to survive Cluster Rolling

2016-10-24 Thread Hannu Kröger

Hi,

Once the client is connected, it will automatically connect to many nodes in 
the cluster. Therefore once the app is running the amount of contact points 
doesn’t matter and if you have consistency level < ALL (or QUORUM where 
replication factor is <= 2), your app should tolerate rolling restart ok.

That being said, you should have more than one contact point because if you 
restart your application and the node indicated in contact point happens to be 
down, the application cannot connect to cluster and fails. Having two contact 
points is a good start.

Cheers,
Hannu

> On 24 Oct 2016, at 15:04, Andreas Fritzler  wrote:
> 
> Hi,
> 
> I was wondering if it is enough to set a list of contact points via:
> 
> Cluster.builder().addContactPoint("host1").addContactPoint("host2")...;
> to survive a cluster rolling while inserting/reading from the cluster.
> 
> Regards,
> Andreas

Re: Row cache not working

2016-10-03 Thread Hannu Kröger

If I remember correctly row cache caches only N rows from the beginning of the 
partition. N being some configurable number. 

See this link which is suggesting that:
http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

Br,
Hannu

> On 4 Oct 2016, at 1.32, Edward Capriolo  wrote:
> 
> Since the feature is off by default. The coverage might could be only as deep 
> as the specific tests that test it.
> 
>> On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa  
>> wrote:
>> Seems like it’s probably worth opening a jira issue to track it (either to 
>> confirm it’s a bug, or to be able to better explain if/that it’s working as 
>> intended – the row cache is probably missing because trace indicates the 
>> read isn’t cacheable, but I suspect it should be cacheable).
>> 
>>  
>>  
>>  
>> 
>> 
>> Do note, though, that setting rows_per_partition to ALL can be very very 
>> very dangerous if you have very wide rows in any of your tables with row 
>> cache enabled.
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> From: Abhinav Solan 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 3, 2016 at 1:38 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> It's cassandra 3.0.7, 
>> 
>> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then 
>> only it works don't know why.
>> 
>> If I set 'rows_per_partition':'1' then it does not work.
>> 
>>  
>> 
>> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
>> cache would be refreshed automatically or it would be lazy, whenever the 
>> fetch call is made then only it caches it.
>> 
>>  
>> 
>> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa  wrote:
>> 
>> Which version of Cassandra are you running (I can tell it’s newer than 2.1, 
>> but exact version would be useful)?
>> 
>>  
>> 
>> From: Abhinav Solan 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 3, 2016 at 11:35 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> Hi, can anyone please help me with this
>> 
>>  
>> 
>> Thanks,
>> 
>> Abhinav
>> 
>>  
>> 
>> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan  
>> wrote:
>> 
>> Hi Everyone,
>> 
>>  
>> 
>> My table looks like this -
>> 
>> CREATE TABLE test.reads (
>> 
>> svc_pt_id bigint,
>> 
>> meas_type_id bigint,
>> 
>> flags bigint,
>> 
>> read_time timestamp,
>> 
>> value double,
>> 
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> 
>> ) WITH bloom_filter_fp_chance = 0.1
>> 
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> 
>> AND comment = ''
>> 
>> AND compaction = {'class': 
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> 
>> AND compression = {'chunk_length_in_kb': '64', 'class': 
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> 
>> AND crc_check_chance = 1.0
>> 
>> AND dclocal_read_repair_chance = 0.1
>> 
>> AND default_time_to_live = 0
>> 
>> AND gc_grace_seconds = 864000
>> 
>> AND max_index_interval = 2048
>> 
>> AND memtable_flush_period_in_ms = 0
>> 
>> AND min_index_interval = 128
>> 
>> AND read_repair_chance = 0.0
>> 
>> AND speculative_retry = '99PERCENTILE';
>> 
>>  
>> 
>> Have set up the C* nodes with
>> 
>> row_cache_size_in_mb: 1024
>> 
>> row_cache_save_period: 14400
>> 
>>  
>> 
>> and I am making this query 
>> 
>> select svc_pt_id, meas_type_id, read_time, value FROM 
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;
>> 
>>  
>> 
>> with tracing on every time it says Row cache miss
>> 
>>  
>> 
>> activity 
>>  
>> | timestamp  | source  | source_elapsed
>> 
>> ---++-+
>> 
>>  
>>Execute 
>> CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0
>> 
>>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 
>> 146; [SharedPool-Worker-1] |

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger

I do agree on that.

> On 15 Sep 2016, at 16:23, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> I'd advise anyone against using the old native secondary index ... You'll get 
> poor performance (that's the main reason why some people developed SASI).
> 
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Hi,
> 
> The ‘old-fashioned’ secondary indexes do support index of collection values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 
> <https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html>
> 
> Br,
> Hannu
> 
>> On 15 Sep 2016, at 15:59, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> "But the problem is I can't use secondary indexing "where int25=5", while 
>> with normal columns I can."
>> 
>> You have many objectives that contradict themselves in term of impl.
>> 
>> Right now you're unlucky, SASI does not support indexing collections yet (it 
>> may come in future, when ?  ¯\_(ツ)_/¯ )
>> 
>> If you're using DSE Search or Stratio Lucene Index, you can index map values 
>> 
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha <dorian.ho...@gmail.com 
>> <mailto:dorian.ho...@gmail.com>> wrote:
>> Yes that makes more sense. But the problem is I can't use secondary indexing 
>> "where int25=5", while with normal columns I can.
>> 
>> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>> <mailto:sfesc...@gmail.com> <sfesc...@gmail.com <mailto:sfesc...@gmail.com>> 
>> wrote:
>> I agree a single blob would also work (I do that in some cases). The reason 
>> for the map is if you need more flexible updating. I think your solution of 
>> a map/data type works well.
>> 
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> "But I need rows together to work with them (indexing etc)"
>> 
>> What do you mean rows together ? You mean that you want to fetch a single 
>> row instead of 1 row per property right ?
>> 
>> In this case, the map might be the solution:
>> 
>> CREATE TABLE generic_with_maps(
>>object_id uuid
>>boolean_map map<text, boolean>
>>text_map map<text, text>
>>long_map map<text, long>,
>>...
>>PRIMARY KEY(object_id)
>> );
>> 
>> The trick here is to store all the fields of the object in different map, 
>> depending on the type of the field.
>> 
>> The map key is always text and it contains the name of the field.
>> 
>> Example
>> 
>> {
>>"id": ,
>> "name": "John DOE",
>> "age":  32,
>> "last_visited_date":  "2016-09-10 12:01:03", 
>> }
>> 
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
>> '2016-09-10 12:01:03'});
>> 
>> When you do a select, you'll get a SINGLE row returned. But then you need to 
>> extract all the properties from different maps, not a big deal
>> 
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha <dorian.ho...@gmail.com 
>> <mailto:dorian.ho...@gmail.com>> wrote:
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
>> together to work with them (indexing etc).
>> 
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic 
>> schema (may have, and will use the map if I do). I just have thousands of 
>> schemas. One user needs 10 integers, while another user needs 20 booleans, 
>> and another needs 30 integers, or a combination of them all.
>> 
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> "Another possible alternative is to use a single map column"
>> 
>> --> how do you manage the different types then ? Because maps in Cassandra 
>> are strongly typed
>> 
>> Unless you set the type of map value to blob, in this case you might as well 
>> store all the object as a single blob column
>> 
>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>> <mailto:sfesc...@gmail.com> <sfesc...@gmail.com <mailto:sfesc...@gmail.com>> 
>> wrote:
>> Another possible alternative is to use a single map column.
>> 
&g

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger

Hi,

The ‘old-fashioned’ secondary indexes do support index of collection values:
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html 


Br,
Hannu

> On 15 Sep 2016, at 15:59, DuyHai Doan  wrote:
> 
> "But the problem is I can't use secondary indexing "where int25=5", while 
> with normal columns I can."
> 
> You have many objectives that contradict themselves in term of impl.
> 
> Right now you're unlucky, SASI does not support indexing collections yet (it 
> may come in future, when ?  ¯\_(ツ)_/¯ )
> 
> If you're using DSE Search or Stratio Lucene Index, you can index map values 
> 
> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha  > wrote:
> Yes that makes more sense. But the problem is I can't use secondary indexing 
> "where int25=5", while with normal columns I can.
> 
> On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com 
>  > 
> wrote:
> I agree a single blob would also work (I do that in some cases). The reason 
> for the map is if you need more flexible updating. I think your solution of a 
> map/data type works well.
> 
> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan  > wrote:
> "But I need rows together to work with them (indexing etc)"
> 
> What do you mean rows together ? You mean that you want to fetch a single row 
> instead of 1 row per property right ?
> 
> In this case, the map might be the solution:
> 
> CREATE TABLE generic_with_maps(
>object_id uuid
>boolean_map map
>text_map map
>long_map map,
>...
>PRIMARY KEY(object_id)
> );
> 
> The trick here is to store all the fields of the object in different map, 
> depending on the type of the field.
> 
> The map key is always text and it contains the name of the field.
> 
> Example
> 
> {
>"id": ,
> "name": "John DOE",
> "age":  32,
> "last_visited_date":  "2016-09-10 12:01:03", 
> }
> 
> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': 
> '2016-09-10 12:01:03'});
> 
> When you do a select, you'll get a SINGLE row returned. But then you need to 
> extract all the properties from different maps, not a big deal
> 
> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha  > wrote:
> @DuyHai
> Yes, that's another case, the "entity" model used in rdbms. But I need rows 
> together to work with them (indexing etc).
> 
> @sfespace
> The map is needed when you have a dynamic schema. I don't have a dynamic 
> schema (may have, and will use the map if I do). I just have thousands of 
> schemas. One user needs 10 integers, while another user needs 20 booleans, 
> and another needs 30 integers, or a combination of them all.
> 
> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan  > wrote:
> "Another possible alternative is to use a single map column"
> 
> --> how do you manage the different types then ? Because maps in Cassandra 
> are strongly typed
> 
> Unless you set the type of map value to blob, in this case you might as well 
> store all the object as a single blob column
> 
> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com 
>  > 
> wrote:
> Another possible alternative is to use a single map column.
> 
> 
> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha  > wrote:
> Since I will only have 1 table with that many columns, and the other tables 
> will be "normal" tables with max 30 columns, and the memory of 2K columns 
> won't be that big, I'm gonna guess I'll be fine.
> 
> The data model is too dynamic, the alternative would be to create a table for 
> each user which will have even more overhead since the number of users is in 
> the several thousands/millions.
> 
> 
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan  > wrote:
> There is no real limit in term of number of columns in a table, I would say 
> that the impact of having a lot of columns is the amount of meta data C* 
> needs to keep in memory for encoding/decoding each row.
> 
> Now, if you have a table with 1000+ columns, the problem is probably your 
> data model...
> 
> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha  > wrote:
> Is there alot of overhead with having a big number of columns in a table ? 
> Not unbounded, but say, would 2000 be a problem(I think that's the maximum 
> I'll need) ?
> 
> Thank You
> 
> 
> 
> 
> 
> 
>

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Hannu Kröger

Ok, so I have to say that I’m not 100% sure how many replicas of data is it 
trying to maintain but it should not blow up (if repair crashes or something, 
it’s ok). So it should be safe to do.

When the repair has run you can start with the plan I suggested and run repairs 
afterwards.

Hannu

> On 8 Sep 2016, at 18:01, Benyi Wang <bewang.t...@gmail.com> wrote:
> 
> Thanks. What about this situation:
> 
> * Change RF 2 => 3
> * Start repair
> * Roll back RF 3 => 2
> * repair is still running
> 
> I'm wondering what the repair is trying to do? The repair is trying to fix as 
> RF=2 or still trying to fix like RF=3?
> 
> On Thu, Sep 8, 2016 at 2:53 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Yep, you can fix it by running repair or even faster by changing the 
> consistency level to local_quorum and deploying the new version of the app.
> 
> Hannu
> 
>> On 8 Sep 2016, at 17:51, Benyi Wang <bewang.t...@gmail.com 
>> <mailto:bewang.t...@gmail.com>> wrote:
>> 
>> Thanks Hannu,
>> 
>> Unfortunately, we started changing RF from 2 to 3, and did see the empty 
>> result rate is going higher. I assume that  "If the LOCAL_ONE read hit the 
>> new replica which is not there yet, the CQL query will return nothing." Is 
>> my assumption correct?
>> 
>> On Thu, Sep 8, 2016 at 11:49 AM, Hannu Kröger <hkro...@gmail.com 
>> <mailto:hkro...@gmail.com>> wrote:
>> Hi,
>> 
>> If you change RF=2 -> 3 first, the LOCAL_ONE reads might hit the new replica 
>> which is not there yet. So I would change LOCAL_ONE -> LOCAL_QUORUM first 
>> and then change the RF and then run the repair. LOCAL_QUORUM is effectively 
>> ALL in your case (RF=2) if you have just one DC, so you can change the batch 
>> CL later.
>> 
>> Cheers,
>> Hannu
>> 
>> > On 8 Sep 2016, at 14:42, Benyi Wang <bewang.t...@gmail.com 
>> > <mailto:bewang.t...@gmail.com>> wrote:
>> >
>> > * I have a keyspace with RF=2;
>> > * The client read the table using LOCAL_ONE;
>> > * There is a batch job loading data into the tables using ALL.
>> >
>> > I want to change RF to 3 and both the client and the batch job use 
>> > LOCAL_QUORUM.
>> >
>> > My question is "Will the client still read the correct data when the 
>> > repair is running at the time my batch job loading is running too?"
>> >
>> > Or should I change to LOCAL_QUORUM first?
>> >
>> > Thanks.
>> 
>> 
> 
>

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Hannu Kröger

Yep, you can fix it by running repair or even faster by changing the 
consistency level to local_quorum and deploying the new version of the app.

Hannu

> On 8 Sep 2016, at 17:51, Benyi Wang <bewang.t...@gmail.com> wrote:
> 
> Thanks Hannu,
> 
> Unfortunately, we started changing RF from 2 to 3, and did see the empty 
> result rate is going higher. I assume that  "If the LOCAL_ONE read hit the 
> new replica which is not there yet, the CQL query will return nothing." Is my 
> assumption correct?
> 
> On Thu, Sep 8, 2016 at 11:49 AM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Hi,
> 
> If you change RF=2 -> 3 first, the LOCAL_ONE reads might hit the new replica 
> which is not there yet. So I would change LOCAL_ONE -> LOCAL_QUORUM first and 
> then change the RF and then run the repair. LOCAL_QUORUM is effectively ALL 
> in your case (RF=2) if you have just one DC, so you can change the batch CL 
> later.
> 
> Cheers,
> Hannu
> 
> > On 8 Sep 2016, at 14:42, Benyi Wang <bewang.t...@gmail.com 
> > <mailto:bewang.t...@gmail.com>> wrote:
> >
> > * I have a keyspace with RF=2;
> > * The client read the table using LOCAL_ONE;
> > * There is a batch job loading data into the tables using ALL.
> >
> > I want to change RF to 3 and both the client and the batch job use 
> > LOCAL_QUORUM.
> >
> > My question is "Will the client still read the correct data when the repair 
> > is running at the time my batch job loading is running too?"
> >
> > Or should I change to LOCAL_QUORUM first?
> >
> > Thanks.
> 
>

Re: Is it safe to change RF in this situation?

2016-09-08 Thread Hannu Kröger

Hi,

If you change RF=2 -> 3 first, the LOCAL_ONE reads might hit the new replica 
which is not there yet. So I would change LOCAL_ONE -> LOCAL_QUORUM first and 
then change the RF and then run the repair. LOCAL_QUORUM is effectively ALL in 
your case (RF=2) if you have just one DC, so you can change the batch CL later.

Cheers,
Hannu

> On 8 Sep 2016, at 14:42, Benyi Wang  wrote:
> 
> * I have a keyspace with RF=2;
> * The client read the table using LOCAL_ONE;
> * There is a batch job loading data into the tables using ALL.
> 
> I want to change RF to 3 and both the client and the batch job use 
> LOCAL_QUORUM.
> 
> My question is "Will the client still read the correct data when the repair 
> is running at the time my batch job loading is running too?"
> 
> Or should I change to LOCAL_QUORUM first?
> 
> Thanks.

Re: Query regarding spark on cassandra

2016-04-28 Thread Hannu Kröger

Ok, then I don’t understand the problem.

Hannu

> On 28 Apr 2016, at 11:19, Siddharth Verma <verma.siddha...@snapdeal.com> 
> wrote:
> 
> Hi Hannu,
> 
> Had the issue been caused due to read, the insert, and delete statement would 
> have been erroneous.
> "I saw the stdout from web-ui of spark, and the query along with true was 
> printed for both the queries.".
> The statements were correct as seen on the UI.
> Thanks,
> Siddharth Verma
> 
> 
> 
> On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger <hkro...@gmail.com 
> <mailto:hkro...@gmail.com>> wrote:
> Hi,
> 
> could it be consistency level issue? If you use ONE for reads and writes, 
> might be that sometimes you don't get what you are writing.
> 
> See:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>  
> <https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html>
> 
> Br,
> Hannu
> 
> 
> 2016-04-27 20:41 GMT+03:00 Siddharth Verma <verma.siddha...@snapdeal.com 
> <mailto:verma.siddha...@snapdeal.com>>:
> Hi,
> I dont know, if someone has faced this problem or not.
> I am running a job where some data is loaded from cassandra table. From that 
> data, i make some insert and delete statements.
> and execute it (using forEach)
> 
> Code snippet:
> boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
> boolean  insertStatus = connector.openSession().execute(insert).wasApplied();
> System.out.println(delete+":"+deleteStatus);
> System.out.println(insert+":"+insertStatus);
> 
> When i run it locally, i see the respective results in the table.
> 
> However when i run it on a cluster, sometimes the result is displayed and 
> sometime the changes don't take place.
> I saw the stdout from web-ui of spark, and the query along with true was 
> printed for both the queries.
> 
> I can't understand, what could be the issue.
> 
> Any help would be appreciated.
> 
> Thanks,
> Siddharth Verma
> 
>

1 2 >

1 - 100 of 131 matches

Mail list logo