Re: Null pointer exception after delete in a table with statics

2015-08-19 Thread Sebastian Estevez
Can you include your read code?
On Aug 18, 2015 5:50 AM, Hervé Rivière herve.rivi...@zenika.com wrote:

 Hello,





 I have an issue with a ErrorMessage code= [Server error]
 message=java.lang.NullPointerException when I query a table with static
 fields (without where clause) with Cassandra 2.1.8 / 2 nodes clusters.



 No more indication in the log :

 ERROR [SharedPool-Worker-1] 2015-08-18 10:39:02,549 QueryMessage.java:132
 - Unexpected error during query

 java.lang.NullPointerException: null

 ERROR [SharedPool-Worker-1] 2015-08-18 10:39:02,550 ErrorMessage.java:251
 - Unexpected exception during request

 java.lang.NullPointerException: null





 The scenario was :

 1) loading data inside the table with spark (~12 million rows)

 2) Make some deletes with the primary keys and use the static fields to
 keep a certain state for each partition.



 The null pointer exception occurs when I query all the table after I made
 some deletions.



 I observed that :

 - Before delete statement the table is perfectly readable

 - It's repeatable  (I achieved to isolate ~20 delete statements that
 create a null pointer exception  when they are executed by cqlsh)

 - it occurs only  with some rows (nothing special in these rows compared
 to others)

 - Didn't succeed to repeat the problem with the problematic rows inside a
 toy table

 - repair/compact and scrub on each node before and after the deletes
 statements didn't change anything (always the null pointer exception after
 the delete)

 - Maybe related with static columns ?



 The table structure is :

 CREATE TABLE my_table (

 pk1 text,

 pk2 text,

 ck1 timestamp,

 ck2 text,

 ck3 text,

 valuefield text,

 staticField1 text static,

 staticField2 text static,

 PRIMARY KEY ((pk1, pk2), ck1, ck2, ck3)

 ) WITH CLUSTERING ORDER BY (pk1 DESC, pk2 ASC, ck1 ASC)

 AND bloom_filter_fp_chance = 0.01

 AND caching = '{keys:ALL, rows_per_partition:NONE}'

 AND compaction = {'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}

 AND dclocal_read_repair_chance = 0.1

 AND default_time_to_live = 0

 AND gc_grace_seconds = 0

 AND max_index_interval = 2048

 AND memtable_flush_period_in_ms = 0

 AND min_index_interval = 128

 AND read_repair_chance = 0.0

 AND speculative_retry = '99.0PERCENTILE';









 Is someone already met this issue or has an idea to solve/investigate this
 exceptions ?





 Thank you





 Regards





 --

 Hervé



Truncate Table - What happens to indexes?

2015-08-19 Thread Rahul Gupta
What happens to indexes when a table is truncated?
Indexes are removed or they stay around?


Rahul Gupta
DEKA Research  Developmenthttp://www.dekaresearch.com/
340 Commercial St  Manchester, NH  03101
P: 603.666.3908 extn. 6504 | C: 603.718.9676

This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.



This e-mail and the information, including any attachments it contains, are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.


Re: Truncate Table - What happens to indexes?

2015-08-19 Thread Robert Coli
On Wed, Aug 19, 2015 at 11:05 AM, Rahul Gupta rgu...@dekaresearch.com
wrote:

 What happens to indexes when a table is truncated?

 Indexes are removed or they stay around?


Secondary indexes are stored on disk in the same data directory as the data
and are truncated when the data they index is truncated.

=Rob


Question about how to remove data

2015-08-19 Thread Analia Lorenzatto
Hello guys,

I have a cassandra cluster 2.1 comprised of 4 nodes.

I removed a lot of data in a Column Family, then I ran manually a
compaction on this Column family on every node.   After doing that, If I
query that data, cassandra correctly says this data is not there.  But the
space on disk is exactly the same before removing that data.

Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
say that it could produce zombie data, what do you think?

I do not have a TTL defined on the Column family, and I do not have the
possibility to create it.   So my questions is, given that I do not have a
TTL defined is data going to be removed?  or the deleted data is never
actually going to be deleted due to I do not have a TTL?


Thanks in advance!

-- 
Saludos / Regards.

Analía Lorenzatto.

“It's possible to commit no errors and still lose. That is not weakness.
That is life.  By Captain Jean-Luc Picard.


Re: Question about how to remove data

2015-08-19 Thread Laing, Michael
Possibly you have snapshots? If so, use nodetool to clear them.

On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto 
analialorenza...@gmail.com wrote:

 Hello guys,

 I have a cassandra cluster 2.1 comprised of 4 nodes.

 I removed a lot of data in a Column Family, then I ran manually a
 compaction on this Column family on every node.   After doing that, If I
 query that data, cassandra correctly says this data is not there.  But the
 space on disk is exactly the same before removing that data.

 Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
 say that it could produce zombie data, what do you think?

 I do not have a TTL defined on the Column family, and I do not have the
 possibility to create it.   So my questions is, given that I do not have a
 TTL defined is data going to be removed?  or the deleted data is never
 actually going to be deleted due to I do not have a TTL?


 Thanks in advance!

 --
 Saludos / Regards.

 Analía Lorenzatto.

 “It's possible to commit no errors and still lose. That is not weakness.
 That is life.  By Captain Jean-Luc Picard.



Re: Question about how to remove data

2015-08-19 Thread Analia Lorenzatto
Hello Michael,

Thanks for responding!

I do not have snapshots on any node of the cluster.

Saludos / Regards.

Analía Lorenzatto.

Hapiness is not something really made. It comes from your own actions by
Dalai Lama


On 19 Aug 2015 6:19 pm, Laing, Michael michael.la...@nytimes.com wrote:

 Possibly you have snapshots? If so, use nodetool to clear them.

 On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto 
 analialorenza...@gmail.com wrote:

 Hello guys,

 I have a cassandra cluster 2.1 comprised of 4 nodes.

 I removed a lot of data in a Column Family, then I ran manually a
 compaction on this Column family on every node.   After doing that, If I
 query that data, cassandra correctly says this data is not there.  But the
 space on disk is exactly the same before removing that data.

 Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
 say that it could produce zombie data, what do you think?

 I do not have a TTL defined on the Column family, and I do not have the
 possibility to create it.   So my questions is, given that I do not have a
 TTL defined is data going to be removed?  or the deleted data is never
 actually going to be deleted due to I do not have a TTL?


 Thanks in advance!

 --
 Saludos / Regards.

 Analía Lorenzatto.

 “It's possible to commit no errors and still lose. That is not weakness.
 That is life.  By Captain Jean-Luc Picard.





Re: Nodetool repair with Load times 5

2015-08-19 Thread Jean Tremblay
Dear Alain,

Thanks again for your precious help.

I might help, but I need to know what you have done recently (change the RF, 
Add remove node, cleanups, anything else as much as possible...)

I have a cluster of 5 nodes all running Cassandra 2.1.8.
I have a fixed schema which never changes. I have not changed RF, it is 3. I 
have not remove nodes, no cleanups.

Basically here are the important operations I have done:

- Install Cassandra 2.1.7 on a cluster of 5 nodes with RF 3 using Sized-Tiered 
compaction.
- Insert 2 billion rows. (bulk load)
- Made loads of selects statements… Verified that the data is good.
- Did some deletes and a bit more inserts.
- Eventually migrated to 2.1.8
- Then only very few delete/inserts.
- Did a few snapshots.

When I was doing “nodetool status” I always got a load of about 200 GB on 
**all** nodes.

- Then I did a “nodetool -h node0 repair -par -pr -inc” and after that I had a 
completely different picture.

nodetool -h zennode0 status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  OwnsHost ID   
Rack
UN  192.168.2.104  941.49 GB  256 ?   
c13e0858-091c-47c4-8773-6d6262723435  rack1
UN  192.168.2.100  1.07 TB256 ?   
c32a9357-e37e-452e-8eb1-57d86314b419  rack1
UN  192.168.2.101  189.72 GB  256 ?   
9af90dea-90b3-4a8a-b88a-0aeabe3cea79  rack1
UN  192.168.2.102  948.61 GB  256 ?   
8eb7a5bb-6903-4ae1-a372-5436d0cc170c  rack1
UN  192.168.2.103  197.27 GB  256 ?   
9efc6f13-2b02-4400-8cde-ae831feb86e9  rack1


Also, could you please do the nodetool status myks for your keyspace(s) ? We 
will then be able to know the theoretical ownership of each node on your 
distinct (or unique) keyspace(s) ?

nodetool -h zennode0 status XYZdata
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID 
  Rack
UN  192.168.2.104  941.49 GB  256 62.5% 
c13e0858-091c-47c4-8773-6d6262723435  rack1
UN  192.168.2.100  1.07 TB256 58.4% 
c32a9357-e37e-452e-8eb1-57d86314b419  rack1
UN  192.168.2.101  189.72 GB  256 58.4% 
9af90dea-90b3-4a8a-b88a-0aeabe3cea79  rack1
UN  192.168.2.102  948.61 GB  256 60.1% 
8eb7a5bb-6903-4ae1-a372-5436d0cc170c  rack1
UN  192.168.2.103  197.27 GB  256 60.6% 
9efc6f13-2b02-4400-8cde-ae831feb86e9  rack1


Some ideas:

You repaired only a primary range (-pr) of one node, with a RF of 3 and have 
3 big nodes, if not using vnodes, this would be almost normal (excepted for the 
gap 200 GB -- 1 TB, this is huge, unless you messed up with RF). So are you 
using them ?

My schema is totally fixed and I use RF 3 since the beginning. Sorry I’m not 
too aquinted with vnodes. I have not changed anything in the cassandra.yaml 
except the seeds and the name of the cluster.

2/ Load is barely the size of the data on each node

If it is the size of the data how can it fit on the disk?
My 5 nodes have an SSD drive of 1 TB and here is the disk usage for each of 
them:

node0: 25%
node1: 25%
node2: 24%
node3: 26%
node4: 29%

nodetool status says that the load for node0 is 1.07TB. That is more than fit 
of it’s disk, and the disk usage for node0 is 25%.

This is not clear for me… the Load in nodetool status output seems to be more 
that “the size of the data on a node”.


On 18 Aug 2015, at 19:29 , Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:

Hi Jean,

I might help, but I need to know what you have done recently (change the RF, 
Add remove node, cleanups, anything else as much as possible...)

Also, could you please do the nodetool status myks for your keyspace(s) ? We 
will then be able to know the theoretical ownership of each node on your 
distinct (or unique) keyspace(s) ?

Some ideas:

You repaired only a primary range (-pr) of one node, with a RF of 3 and have 
3 big nodes, if not using vnodes, this would be almost normal (excepted for the 
gap 200 GB -- 1 TB, this is huge, unless you messed up with RF). So are you 
using them ?

Answers:

1/ It depends on what happen to this cluster (see my questions above)
2/ Load is barely the size of the data on each node
3/ No, this is not a normal nor stable situation.
4/ No, pr means you repaired only the partition that node is responsible for 
(depends on token), you have to run this on all nodes. But I would wait to find 
out first what's happening to avoid hitting the threshold on disk space or 
whatever.

I guess I have been confused with the -par switch, which means to me that the 
work will be done in parallel and therefore will be done on all nodes.

So if I understand right, one should do a “nodetool repair -par -pr -inc” on 
all nodes one after the other? Is this correct?


I have a second cluster, a smaller one, 

RE: Null pointer exception after delete in a table with statics

2015-08-19 Thread Hervé Rivière
Hello Doan,



Thank you for your answer !



In my spark job I changed the spark.cassandra.input.split.size
(spark.cassandra.input.fetch.size_in_rows isn’t recognize in my v. 1.2.3
spark-cassandra-connector)

from 8 000 to 200 (so that’s create a lot more tasks by node) but I still
have the null pointer exception (at the same row than before).



Actually my spark job do two thing : 1/ loading the table from another
Cassandra table. 2/ Update with specific rules the two static fields.



I noticed that there is no problem to make delete after step 1/ (when all
the static fields are null).



The null pointer exception occurs only after the step 2/ (where there are
some not null static in the table).



I will try to merge step 1 and 2 into one and therefore only make one
INSERT by row when I load the table and see what happen





--

Hervé





*De :* DuyHai Doan [mailto:doanduy...@gmail.com]
*Envoyé :* mardi 18 août 2015 15:25
*À :* user@cassandra.apache.org
*Objet :* Re: Null pointer exception after delete in a table with statics



Weird, you issue makes me remember of
https://issues.apache.org/jira/browse/CASSANDRA-8502 but it seems that it
has been fixed since 2.1.6 and you're using 2.1.8



Can you try to reproduce it using small page with Spark
(spark.cassandra.input.fetch.size_in_rows)
?



On Tue, Aug 18, 2015 at 11:50 AM, Hervé Rivière herve.rivi...@zenika.com
wrote:

Hello,





I have an issue with a ErrorMessage code= [Server error]
message=java.lang.NullPointerException when I query a table with static
fields (without where clause) with Cassandra 2.1.8 / 2 nodes clusters.



No more indication in the log :

ERROR [SharedPool-Worker-1] 2015-08-18 10:39:02,549 QueryMessage.java:132 -
Unexpected error during query

java.lang.NullPointerException: null

ERROR [SharedPool-Worker-1] 2015-08-18 10:39:02,550 ErrorMessage.java:251 -
Unexpected exception during request

java.lang.NullPointerException: null





The scenario was :

1) loading data inside the table with spark (~12 million rows)

2) Make some deletes with the primary keys and use the static fields to
keep a certain state for each partition.



The null pointer exception occurs when I query all the table after I made
some deletions.



I observed that :

- Before delete statement the table is perfectly readable

- It's repeatable  (I achieved to isolate ~20 delete statements that create
a null pointer exception  when they are executed by cqlsh)

- it occurs only  with some rows (nothing special in these rows compared to
others)

- Didn't succeed to repeat the problem with the problematic rows inside a
toy table

- repair/compact and scrub on each node before and after the deletes
statements didn't change anything (always the null pointer exception after
the delete)

- Maybe related with static columns ?



The table structure is :

CREATE TABLE my_table (

pk1 text,

pk2 text,

ck1 timestamp,

ck2 text,

ck3 text,

valuefield text,

staticField1 text static,

staticField2 text static,

PRIMARY KEY ((pk1, pk2), ck1, ck2, ck3)

) WITH CLUSTERING ORDER BY (pk1 DESC, pk2 ASC, ck1 ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = '{keys:ALL, rows_per_partition:NONE}'

AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 0

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';









Is someone already met this issue or has an idea to solve/investigate this
exceptions ?





Thank you





Regards





--

Hervé