Re: Incorrect progress percentage while repairing

2016-06-02 Thread Stefano Ortolani
Forgot to add the C* version. That would be 3.0.6.

Regards,
Stefano Ortolani

On Thu, Jun 2, 2016 at 3:55 PM, Stefano Ortolani  wrote:

> Hi,
>
> While running incremental (parallel) repairs on the first partition range
> (-pr), I rarely see the progress percentage going over 20%/25%.
>
> 2016-06-02 14:12:23,207] Repair session
> cceae4c0-28b0-11e6-86d1-0550db2f124e for range
> [(8861148493126800521,8883879502599079650]] finished (progress: 22%)
>
> Nodetool does return normally and no error is found in its output or in
> the cassandra logs.
> Any idea why? Is this behavior expected?
>
> Regards,
> Stefano Ortolani
>
>
>


Re: Library/utility announcements?

2016-06-02 Thread Eric Evans
On Wed, Jun 1, 2016 at 6:48 PM, James Carman  wrote:
> Some user lists allow it. Does the Cassandra community mind folks announcing
> their super cool Cassandra libraries on this list?

I think it's probably OK; People do occasionally announce such things here.

> Is there a page for us to list them?

I guess it depends on what it is, maybe one of:

http://wiki.apache.org/cassandra/ClientOptions
http://wiki.apache.org/cassandra/IntegrationPoints
http://wiki.apache.org/cassandra/Administration%2520Tools

Cheers,

-- 
Eric Evans
john.eric.ev...@gmail.com


Incorrect progress percentage while repairing

2016-06-02 Thread Stefano Ortolani
Hi,

While running incremental (parallel) repairs on the first partition range
(-pr), I rarely see the progress percentage going over 20%/25%.

2016-06-02 14:12:23,207] Repair session
cceae4c0-28b0-11e6-86d1-0550db2f124e for range
[(8861148493126800521,8883879502599079650]] finished (progress: 22%)

Nodetool does return normally and no error is found in its output or in the
cassandra logs.
Any idea why? Is this behavior expected?

Regards,
Stefano Ortolani


Re: Internal Handling of Map Updates

2016-06-02 Thread Eric Stevens
If it's overwrites and append only with no removes, an UPDATE will let you
do that to standard collections. Like INSERT, UPDATE acts like an UPSERT.

On Thu, Jun 2, 2016, 12:52 AM Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> JSON would be an option, yes. A frozen collection would not work for us,
> as the updates are both overwrites of existing values and appends of new
> values (but never a remove of values).
> So we end up with 3 options:
>
> 1. use clustering columns
> 2. use json
> 3. save the row not using the spark-cassandra-connectors saveToCassandra()
> method (which does an insert of the whole row and map), but writing an own
> save method using update on the map (as Eric proposed).
>
> I think we will go for option 1 or 2 as those are the least costly
> solutions.
>
> Nevertheless, its a pity that an insert on a row with a map will always
> create tombstones :-(
>
>
>
> 2016-06-02 2:02 GMT+02:00 Eric Stevens :
>
>> From that perspective, you could also use a frozen collection which takes
>> away the ability to append, but for which overwrites shouldn't generate a
>> tombstone.
>>
>> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves  wrote:
>>
>>> Is there anything stopping you from using JSON instead of a collection?
>>>
>>> On 27 May 2016 at 15:20, Eric Stevens  wrote:
>>>
 If you aren't removing elements from the map, you should instead be
 able to use an UPDATE statement and append the map. It will have the same
 effect as overwriting it, because all the new keys will take precedence
 over the existing keys. But it'll happen without generating a tombstone
 first.

 If you do have to remove elements from the collection during this
 process, you are either facing tombstones or having to surgically figure
 out which elements ought to be removed (which also involves tombstones,
 though at least not range tombstones, so a bit cheaper).

 On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
 matthias.nieh...@codecentric.de> wrote:

> We are processing events in Spark and store the resulting entries
> (containing a map) in Cassandra. The results can be new (no entry for this
> key in Cassandra) or an Update (there is already an entry with this key in
> Cassandra). We use the spark-cassandra-connector to store the data in
> Cassandra.
>
> The connector will always do an insert of the data and will rely on
> the upsert capabilities of cassandra. So every time an event is updated 
> the
> complete map is replaced with all the problems of tombstones.
> Seems like we have to implement our own persist logic in which we
> check if an element already exists and if yes update the map manually. 
> that
> would require a read before write which would be nasty. Another option
> would be not to use a collection but (clustering) columns. Do you have
> another idea of doing this?
>
> (the conclusion of this whole thing for me would be: use upsert, but
> do specific updates on collections as an upsert might replace the whole
> collection and generate thumbstones)
>
> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :
>
>> If you replace an entire collection, whether it's a map, set, or
>> list, a range tombstone will be inserted followed by the new collection.
>> If you only update a single element, no tombstones are generated.
>>
>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>> matthias.nieh...@codecentric.de> wrote:
>>
>>> Hi,
>>>
>>> we have a table with a Map Field. We do not delete anything in this
>>> table, but to updates on the values including the Map Field (most of the
>>> time a new value for an existing key, Rarely adding new keys). We now
>>> encounter a huge amount of thumbstones for this Table.
>>>
>>> We used sstable2json to take a look into the sstables:
>>>
>>>
>>> {"key": "Betty_StoreCatalogLines:7",
>>>
>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>> 08:40Z",1463820040628001],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>
>>>
>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>
>>>
>>> 

Re: Error while rebuilding a node: Stream failed

2016-06-02 Thread George Sigletos
I gave up completely with rebuild.

Now I am running `nodetool repair` and in case of network issues I retry
for the token ranges that failed using the -st and -et options of `nodetool
repair`.

That would be good enough for now, till we fix our network problems.

On Sat, May 28, 2016 at 7:05 PM, George Sigletos 
wrote:

> No luck unfortunately. It seems that the connection to the destination
> node was lost.
>
> However there was progress compared to the previous times. A lot more data
> was streamed.
>
> (From source node)
> INFO  [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 -
> InetAddress /54.172.235.227 is now DOWN
> INFO  [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238
> OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227
> ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938
> StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9]
> Streaming error occurred
> java.io.IOException: Connection timed out
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> ~[na:1.7.0_79]
> at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
> ~[na:1.7.0_79]
> at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
> at sun.nio.ch.ChannelInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
> at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown
> Source) ~[na:1.7.0_79]
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
> INFO  [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 -
> InetAddress /54.172.235.227 is now UP
>
> On Fri, May 27, 2016 at 5:37 PM, George Sigletos 
> wrote:
>
>> I am trying once more using more aggressive tcp settings, as recommended
>> here
>> 
>>
>> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 
>> net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10
>>
>> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all
>> nodes)
>>
>> Let's see what happens. I don't know what else to try. I have even
>> further increased streaming_socket_timeout_in_ms
>>
>>
>>
>> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta 
>> wrote:
>>
>>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in
>>> this case because the incoming connection on the source node is timing out
>>> on the network layer, and streaming_socket_timeout_in_ms controls the
>>> socket timeout in the app layer and throws SocketTimeoutException (not 
>>> java.io.IOException:
>>> Connection timed out). So you should probably use more aggressive tcp
>>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
>>> tuning that? Even that might not be sufficient as some routers tend to
>>> ignore tcp keep-alives and just kill idle connections.
>>>
>>> As said before, this will ultimately be fixed by adding keep-alive to
>>> the app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help,
>>> one extreme approach would be to backport this to 2.1 (unless some
>>> experienced operator out there has a more creative approach).
>>>
>>> @eevans, I'm not sure he is using a mixed version cluster, it seem he
>>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>>>
>>> 2016-05-27 11:39 GMT-03:00 Eric Evans :
>>>
 From the various stacktraces in this thread, it's obvious you are
 mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
 supported with mixed Cassandra versions.  Sometimes it will work,
 sometimes it won't (and it will definitely not work in this instance).

 You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
 the new nodes using 2.1.13, and upgrade after.

 On Fri, May 27, 2016 at 8:41 AM, George Sigletos <
 sigle...@textkernel.nl> wrote:

  ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
  StreamSession.java:505 - [Stream
 #74c57bc0-231a-11e6-a698-1b05ac77baf9]
  Streaming error occurred
  java.lang.RuntimeException: Outgoing stream handler has been closed
  at
 
 org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
  ~[apache-cassandra-2.1.14.jar:2.1.14]
  at
 
 

Upgredesstables doing 4K reads

2016-06-02 Thread Jacek Luczak
Hi,

I've got a 6 node C* cluster (all nodes are equal both in OS and HW
setup, they are DL380 Gen9 with Smart Array RAID 50,3 on SAS 15K HDDs)
which has been recently updated from 2.2.5 to 3.5. As part of the
update I've done the upgradesstables.

On 4 nodes the average request size issued to the block dev was never
higher than 8 (that maps to 4K reads) while on remaining 2 nodes it
was basically always maxed 512 (256K reads).

Nodes doing 4K reads were pumping max 2K read IOPs while the 2 nodes
never went up above 30 IOPs.

We are quite pedantic about OS settings. All nodes got same settings
and C* configuration. On all nodes block dev got noop scheduler set
and read ahead aligned with strip size.

During heavy read workloads we've also noticed that those 4 nodes can
swing up to 10K IOPs to get data from storage, the 2 are much below.

What can cause such difference?

-Jacek


Re: (Full) compaction does not delete (all) old files

2016-06-02 Thread Alain RODRIGUEZ
Hi Dongfeng,

3: Restarting the code does NOT remove those files. I stopped and restarted
> C* many times and it did nothing.

Finally, my solution was to manually delete those old files. I actually
> deleted them while C* is running and did not see any errors/warnings in
> system.log. My guess is that those files are not in C* metadata so C* does
> not know their existance.


This was a good move. If all the data is TTLed after 8 days, then any
sstable older than 8 days is no longer relevant, this is a guarantee. I
would probably have stopped the node though. Glad it worked.

Automatic compaction by C* does not work in a timely manner for me


You might want to give "unchecked_tombstone_compaction=true" a try on this
table options. This will allow a most aggressive tombstone eviction, and
should be quite safe. Not sure why this is not yet a Cassandra default.
Single sstable compactions will trigger, removing tombstones after 10 days
(gc_grace_seconds). So any data older than 8 days (TTL) + 10 days
(gc_grace_seconds) = 18 days should be eventually (and quite quickly)
removed.

Major compaction (nodetool compaction) produces a very big sstables that
will no longer be compacted until there are 3 other files of the same size
(using default). I think running major comapction delay the issue (and
might make it worse) but does not solve it.

It is also good to know that compaction is doing a lot better in 2.1.X from
my own experience.

B: We have tested the procedure with 2.1.11 in our DEV environment quite
> some time ago. Due to priority changes, we only started applying it to
> production lately. By rule, I had to re-test it if I switch to 2.1.14, and
> I don't see much benefits doing it.


As an example example, if you are planning to take profit of the
incremental repair features (new in 2.1) or DTCS, you probably want to jump
to 2.1.14 because of:

"FIX 2.1.14 - DTCS repair both unrepaired / repaired sstables - incremental
only

https://issues.apache.org/jira/browse/CASSANDRA-3

FIX 2.1.14 - Avoid major compaction mixing repaired and unrepaired sstables
in DTCS

https://issues.apache.org/jira/browse/CASSANDRA-3

FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
- incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10422

FIX 2.1.12 - repair hang when replica is down - incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10288;


I would probably go through 2.1.11 --> 2.1.14 changes and see if it is
worth it. I am not saying you shouldn't test it, but, if migrating to
2.1.11 worked for you, I guess 2.1.14 will work as well. I am quite
confident, but as I won't be responsible of it and of fixing any issue that
might show up, it is up to you :-). An other way is to do one more step
from 2.1.11 to 2.1.14, but I see no value in this as you would have to then
test 2.1.11 --> 2.1.14 upgrade.

> Since we are at 2.0.6, we have to migrate twice, from 2.0.6 to 2.0.17.
> then to 2.1.11.


Glad to see you did not miss that. I pointed it out, just in case :-).

Good luck with this all,

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-06-01 17:52 GMT+01:00 Dongfeng Lu :

> Alain,
>
> Thanks for responding to my question.
>
> 1 & 2: I think it is a bug, but as you said, maybe no one will dig it. I
> just hope it has been fixed in the later versions.
> 3: Restarting the code does NOT remove those files. I stopped and
> restarted C* many times and it did nothing.
> 4: Thanks for the links. I will probably try DTCS in the near future.
>
> A: Automatic compaction by C* does not work in a timely manner for me. I
> set TTL to 8 days, and hoped that I only have data files with timestamps
> like within 2 weeks. However, I often saw files created 2 months ago with
> 50GB in size.
>
> In the final step of upgrade, I am suppose to run upgradesstables, which
> is like a compaction. I know compaction takes a long time to run. In order
> to reduce the amount of time during the actual upgrade, I ran a manual
> compaction to cut down the size, by 80% in my case.
>
> B: We have tested the procedure with 2.1.11 in our DEV environment quite
> some time ago. Due to priority changes, we only started applying it to
> production lately. By rule, I had to re-test it if I switch to 2.1.14, and
> I don't see much benefits doing it.
>
> C: Yes, I noticed the statement "When upgrading to Cassandra 2.1 all nodes
> must be on at least Cassandra 2.0.7 to support rolling start." Since we are
> at 2.0.6, we have to migrate twice, from 2.0.6 to 2.0.17. then to 2.1.11.
>
> Finally, my solution was to manually delete those old files. I actually
> deleted them while C* is running and did not see any errors/warnings in
> system.log. My guess is that those files are not in C* metadata so C* does
> not know their existance.
>
> Thanks,
> Dongfeng
>
>
> On Wednesday, 

Re: Internal Handling of Map Updates

2016-06-02 Thread Matthias Niehoff
JSON would be an option, yes. A frozen collection would not work for us, as
the updates are both overwrites of existing values and appends of new
values (but never a remove of values).
So we end up with 3 options:

1. use clustering columns
2. use json
3. save the row not using the spark-cassandra-connectors saveToCassandra()
method (which does an insert of the whole row and map), but writing an own
save method using update on the map (as Eric proposed).

I think we will go for option 1 or 2 as those are the least costly
solutions.

Nevertheless, its a pity that an insert on a row with a map will always
create tombstones :-(



2016-06-02 2:02 GMT+02:00 Eric Stevens :

> From that perspective, you could also use a frozen collection which takes
> away the ability to append, but for which overwrites shouldn't generate a
> tombstone.
>
> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves  wrote:
>
>> Is there anything stopping you from using JSON instead of a collection?
>>
>> On 27 May 2016 at 15:20, Eric Stevens  wrote:
>>
>>> If you aren't removing elements from the map, you should instead be able
>>> to use an UPDATE statement and append the map. It will have the same effect
>>> as overwriting it, because all the new keys will take precedence over the
>>> existing keys. But it'll happen without generating a tombstone first.
>>>
>>> If you do have to remove elements from the collection during this
>>> process, you are either facing tombstones or having to surgically figure
>>> out which elements ought to be removed (which also involves tombstones,
>>> though at least not range tombstones, so a bit cheaper).
>>>
>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>>> matthias.nieh...@codecentric.de> wrote:
>>>
 We are processing events in Spark and store the resulting entries
 (containing a map) in Cassandra. The results can be new (no entry for this
 key in Cassandra) or an Update (there is already an entry with this key in
 Cassandra). We use the spark-cassandra-connector to store the data in
 Cassandra.

 The connector will always do an insert of the data and will rely on the
 upsert capabilities of cassandra. So every time an event is updated the
 complete map is replaced with all the problems of tombstones.
 Seems like we have to implement our own persist logic in which we check
 if an element already exists and if yes update the map manually. that would
 require a read before write which would be nasty. Another option would be
 not to use a collection but (clustering) columns. Do you have another idea
 of doing this?

 (the conclusion of this whole thing for me would be: use upsert, but do
 specific updates on collections as an upsert might replace the whole
 collection and generate thumbstones)

 2016-05-25 17:37 GMT+02:00 Tyler Hobbs :

> If you replace an entire collection, whether it's a map, set, or list,
> a range tombstone will be inserted followed by the new collection.  If you
> only update a single element, no tombstones are generated.
>
> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
> matthias.nieh...@codecentric.de> wrote:
>
>> Hi,
>>
>> we have a table with a Map Field. We do not delete anything in this
>> table, but to updates on the values including the Map Field (most of the
>> time a new value for an existing key, Rarely adding new keys). We now
>> encounter a huge amount of thumbstones for this Table.
>>
>> We used sstable2json to take a look into the sstables:
>>
>>
>> {"key": "Betty_StoreCatalogLines:7",
>>
>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>
>>["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>> 08:40Z",1463820040628001],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>
>>
>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>
>>
>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>
>> . . .
>>
>>   
>>