Re: Streaming from 1 node only when adding a new DC

2016-06-16 Thread Fabien Rousseau
Thanks,

Created the issue: https://issues.apache.org/jira/browse/CASSANDRA-12015

2016-06-15 15:25 GMT+02:00 Paulo Motta <pauloricard...@gmail.com>:

> For rebuild, replace and -Dcassandra.consistent.rangemovement=false in
> general we currently pick the closest replica (as indicated by the Snitch)
> which has the range, what will often map to the same node due to the
> dynamic snitch, specially when N=RF. This is good for picking a node in the
> same DC or rack for transferring, but we can probably improve this to
> distribute streaming load more evenly within candidate source nodes in the
> same rack/DC.
>
> Would you mind opening a ticket for improving this?
>
>
> 2016-06-14 17:35 GMT-03:00 Fabien Rousseau <fabifab...@gmail.com>:
>
>> We've tested with C* 2.1.14 version
>> Yes VNodes with 256 tokens
>> Once all the nodes in dc2 are added, schema is modified to have RF=3 in
>> dc1 and RF=3 in dc2.
>> Then on each nodes of dc2:
>> nodetool rebuild dc1
>> Le 14 juin 2016 10:39, "kurt Greaves" <k...@instaclustr.com> a écrit :
>>
>>> What version of Cassandra are you using? Also what command are you using
>>> to run the rebuilds? Are you using vnodes?
>>>
>>> On 13 June 2016 at 09:01, Fabien Rousseau <fabifab...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We've tested adding a new DC from an existing DC having 3 nodes and
>>>> RF=3 (ie all nodes have all data).
>>>> During the rebuild process, only one node of the first DC streamed data
>>>> to the 3 nodes of the second DC.
>>>>
>>>> Our goal is to minimise the time it takes to rebuild a DC and would
>>>> like to be able to stream from all nodes.
>>>>
>>>> Starting C* with debug logs, it appears that all nodes, when computing
>>>> their "streaming plan" returns the same node for all ranges.
>>>> This is probably because all nodes in DC2 have the same view of the
>>>> ring.
>>>>
>>>> I understand that when bootstrapping a new node, it's preferable to
>>>> stream from the node being replaced, but when rebuilding a new DC, it
>>>> should probably select sources "randomly" (rather than always selecting the
>>>> same source for a specific range).
>>>> What do you think ?
>>>>
>>>> Best Regards,
>>>> Fabien
>>>>
>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>


Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread Fabien Rousseau
We've tested with C* 2.1.14 version
Yes VNodes with 256 tokens
Once all the nodes in dc2 are added, schema is modified to have RF=3 in dc1
and RF=3 in dc2.
Then on each nodes of dc2:
nodetool rebuild dc1
Le 14 juin 2016 10:39, "kurt Greaves" <k...@instaclustr.com> a écrit :

> What version of Cassandra are you using? Also what command are you using
> to run the rebuilds? Are you using vnodes?
>
> On 13 June 2016 at 09:01, Fabien Rousseau <fabifab...@gmail.com> wrote:
>
>> Hello,
>>
>> We've tested adding a new DC from an existing DC having 3 nodes and RF=3
>> (ie all nodes have all data).
>> During the rebuild process, only one node of the first DC streamed data
>> to the 3 nodes of the second DC.
>>
>> Our goal is to minimise the time it takes to rebuild a DC and would like
>> to be able to stream from all nodes.
>>
>> Starting C* with debug logs, it appears that all nodes, when computing
>> their "streaming plan" returns the same node for all ranges.
>> This is probably because all nodes in DC2 have the same view of the ring.
>>
>> I understand that when bootstrapping a new node, it's preferable to
>> stream from the node being replaced, but when rebuilding a new DC, it
>> should probably select sources "randomly" (rather than always selecting the
>> same source for a specific range).
>> What do you think ?
>>
>> Best Regards,
>> Fabien
>>
>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Streaming from 1 node only when adding a new DC

2016-06-13 Thread Fabien Rousseau
Hello,

We've tested adding a new DC from an existing DC having 3 nodes and RF=3
(ie all nodes have all data).
During the rebuild process, only one node of the first DC streamed data to
the 3 nodes of the second DC.

Our goal is to minimise the time it takes to rebuild a DC and would like to
be able to stream from all nodes.

Starting C* with debug logs, it appears that all nodes, when computing
their "streaming plan" returns the same node for all ranges.
This is probably because all nodes in DC2 have the same view of the ring.

I understand that when bootstrapping a new node, it's preferable to stream
from the node being replaced, but when rebuilding a new DC, it should
probably select sources "randomly" (rather than always selecting the same
source for a specific range).
What do you think ?

Best Regards,
Fabien


Re: MX4J support broken in cassandra 3.0.5?

2016-04-27 Thread Fabien Rousseau
Hi Robert,

This could be related to:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-9242
(Maybe you can try to comment this option and try again)
Le 27 avr. 2016 15:21, "Robert Sicoie"  a écrit :

> Hi guys,
>
> I'm upgrading from cassandra 2.1 to cassandra 3.0.5 and mx4j support seems
> to be broker. An empty html page is shown:
>
> > GET / HTTP/1.1
> > Host: localhost:8081
> > User-Agent: curl/7.43.0
> > Accept: */*
> >
> * HTTP 1.0, assume close after body
> < HTTP/1.0 200 OK
> < expires: now
> < Server: MX4J-HTTPD/1.0
> < Cache-Control: no-cache
> < pragma: no-cache
> < Content-Type: text/html
>
> This is what I have in cassandra-env.sh
> ...
> MX4J_PORT="-Dmx4jport=8081"
> ...
> And the mx4j-tools.jar is in place.
>
> It worked fine with cassandra 2.1. Is there a new configuration needed in
> 3.0.5?
>
> Any advice?
>
> Thanks,
> Robert
>
> In order to protect our email recipients, the Paddy Power Betfair plc
> group of companies use MessageLabs to scan all Incoming and Outgoing mail
> for viruses.
> Paddy Power Betfair may monitor the content of email sent and received for
> the purpose of ensuring compliance with its policies and procedures.
>


Re: Network / GC / Latency spike

2015-09-01 Thread Fabien Rousseau
Hi Alain,

Maybe it's possible to confirm this by testing on a small cluster:
- create a cluster of 2 nodes (using https://github.com/pcmanus/ccm for
example)
- create a fake wide row of a few mb (using the python driver for example)
- drain and stop one of the two nodes
- remove the sstables of the stopped node (to provoke inconsistencies)
- start it again
- select a small portion of the wide row (many times, use nodetool tpstats
to know when a read repair has been triggered)
- nodetool flush (on the previously stopped node)
- check the size of the sstable (if a few kb, then only the selected slice
was repaired, but if a few mb then the whole row was repaired)

The wild guess was: if a read repair was triggered when reading a small
portion of a wide row and if it resulted in streaming the whole wide row,
it could explain a network burst. (But, on a second thought it make more
sense to only repair the small portion being read...)



2015-09-01 12:05 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>:

> Hi Fabien, thanks for your help.
>
> I did not mention it but I indeed saw a correlation between latency and
> read repairs spikes. Though this is like going from 5 RR per second to 10
> per sec cluster wide according to opscenter: http://img42.com/L6gx1
>
> I have indeed some wide rows and this explanation looks reasonable to me,
> I mean this makes sense. Yet isn't this amount of Read Repair too low to
> induce such a "shitstorm" (even if it spikes x2, I got network x10) ? Also
> wide rows are present on heavy used tables (sadly...), so I should be using
> more network all the time (why only a few spikes per day (like 2 / 3 max) ?
>
> How could I confirm this, without removing RR and waiting a week I mean,
> is there a way to see the size of the data being repaired through this
> mechanism ?
>
> C*heers
>
> Alain
>
> 2015-09-01 0:11 GMT+02:00 Fabien Rousseau <fabifab...@gmail.com>:
>
>> Hi Alain,
>>
>> Could it be wide rows + read repair ? (Let's suppose the "read repair"
>> repairs the full row, and it may not be subject to stream throughput limit)
>>
>> Best Regards
>> Fabien
>>
>> 2015-08-31 15:56 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>>
>>> I just realised that I have no idea about how this mailing list handle
>>> attached files.
>>>
>>> Please find screenshots there --> http://img42.com/collection/y2KxS
>>>
>>> Alain
>>>
>>> 2015-08-31 15:48 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> Running a 2.0.16 C* on AWS (private VPC, 2 DC).
>>>>
>>>> I am facing an issue on our EU DC where I have a network burst
>>>> (alongside with GC and latency increase).
>>>>
>>>> My first thought was a sudden application burst, though, I see no
>>>> corresponding evolution on reads / write or even CPU.
>>>>
>>>> So I thought that this might come from the node themselves as IN almost
>>>> equal OUT Network. I tried lowering stream throughput on the whole DC to 1
>>>> Mbps, with ~30 nodes --> 30 Mbps --> ~4 MB/s max. My network went a lot
>>>> higher about 30 M in both sides (see screenshots attached).
>>>>
>>>> I have tried to use iftop to see where this network is headed too, but
>>>> I was not able to do it because burst are very shorts.
>>>>
>>>> So, questions are:
>>>>
>>>> - Did someone experienced something similar already ? If so, any clue
>>>> would be appreciated :).
>>>> - How can I know (monitor, capture) where this big amount of network is
>>>> headed to or due to ?
>>>> - Am I right trying to figure out what this network is or should I
>>>> follow an other lead ?
>>>>
>>>> Notes: I also noticed that CPU does not spike nor does R, but disk
>>>> reads also spikes !
>>>>
>>>> C*heers,
>>>>
>>>> Alain
>>>>
>>>
>>>
>>
>


Re: Network / GC / Latency spike

2015-08-31 Thread Fabien Rousseau
Hi Alain,

Could it be wide rows + read repair ? (Let's suppose the "read repair"
repairs the full row, and it may not be subject to stream throughput limit)

Best Regards
Fabien

2015-08-31 15:56 GMT+02:00 Alain RODRIGUEZ :

> I just realised that I have no idea about how this mailing list handle
> attached files.
>
> Please find screenshots there --> http://img42.com/collection/y2KxS
>
> Alain
>
> 2015-08-31 15:48 GMT+02:00 Alain RODRIGUEZ :
>
>> Hi,
>>
>> Running a 2.0.16 C* on AWS (private VPC, 2 DC).
>>
>> I am facing an issue on our EU DC where I have a network burst (alongside
>> with GC and latency increase).
>>
>> My first thought was a sudden application burst, though, I see no
>> corresponding evolution on reads / write or even CPU.
>>
>> So I thought that this might come from the node themselves as IN almost
>> equal OUT Network. I tried lowering stream throughput on the whole DC to 1
>> Mbps, with ~30 nodes --> 30 Mbps --> ~4 MB/s max. My network went a lot
>> higher about 30 M in both sides (see screenshots attached).
>>
>> I have tried to use iftop to see where this network is headed too, but I
>> was not able to do it because burst are very shorts.
>>
>> So, questions are:
>>
>> - Did someone experienced something similar already ? If so, any clue
>> would be appreciated :).
>> - How can I know (monitor, capture) where this big amount of network is
>> headed to or due to ?
>> - Am I right trying to figure out what this network is or should I follow
>> an other lead ?
>>
>> Notes: I also noticed that CPU does not spike nor does R, but disk
>> reads also spikes !
>>
>> C*heers,
>>
>> Alain
>>
>
>


Re: sstableloader Could not retrieve endpoint ranges

2015-06-19 Thread Fabien Rousseau
Hi,

I already got this error on a 2.1 clusters because thrift was disabled. So
you should check that thrift is enabled and accessible from the
sstableloader process.

Hope this help

Fabien
Le 19 juin 2015 05:44, Mitch Gitman mgit...@gmail.com a écrit :

 I'm using sstableloader to bulk-load a table from one cluster to another.
 I can't just copy sstables because the clusters have different topologies.
 While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
 1.2.19. The source data comes from a nodetool snapshot.

 Here's the command I ran:
 sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*

 Here's the result I got:
 Could not retrieve endpoint ranges:
  -pr,--principal   kerberos principal
  -k,--keytab   keytab location
  --ssl-keystoressl keystore location
  --ssl-keystore-password   ssl keystore password
  --ssl-keystore-type   ssl keystore type
  --ssl-truststore  ssl truststore location
  --ssl-truststore-password ssl truststore password
  --ssl-truststore-type ssl truststore type

 Not sure what to make of this, what with the hints at security arguments
 that pop up. The source and destination clusters have no security.

 Hoping this might ring a bell with someone out there.



Re: Problems after trying a migration

2015-03-18 Thread Fabien Rousseau
Hi David,

There is an excellent article which describes exactly what you want to do
(ie migrate from one DC to another DC) :
http://planetcassandra.org/blog/cassandra-migration-to-ec2/

2015-03-18 17:05 GMT+01:00 David CHARBONNIER david.charbonn...@rgsystem.com
:

  Hi,



 We’re using Cassandra through the Datastax Enterprise package in version
 4.5.1 (Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.



 We need to move our Cassandra cluster from France to another country. To
 do this, we want to add a second 7-nodes datacenter to our cluster and
 stream all data between the two countries before dropping the first
 datacenter.



 On January 31st, we tried doing so but we had some problems:

 -  New nodes in the other country have been installed like French
 nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in
 the other country which means Cassandra version 2.0.8.39 in France and
 2.0.12.200 in the other country)

 -  The following procedure has been followed:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
 but an error occurred during step 3. New nodes have been started before
 the *cassandra-topology.properties* file has been updated on the original
 datacenter. New nodes appeared in the original datacenter instead of the
 new one.

 -  To recover our original cluster, we decommissionned every node
 of the new datacenter with the *nodetool decommission* command.



 On February 9th, nodes in the second datacenter have been restarted and
 joined the cluster. We had to decommission them just like before.



 On February 11th, we added disk space on our 7 running French nodes. To
 achieve this, we restarted the cluster but the nodes updated their perring
 informations and nodes from Luxembourg (decommissionned on February 9th)
 were present. This behaviour is described here:
 https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned
 *system.peers* table content.



 On March 11th, we needed to add an 8th node to our existing French
 cluster. We installed the same Datastax Enterprise version (4.5.1 with
 Cassandra 2.0.8.39) and tried to add this node to the cluster with this
 procedure:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html.
 In OPSCenter, the node was joining the cluster and data streaming got stuck
 at 100%. After several hours, *nodetool status* showed us that the node
 was still joining but nothing in the logs let us know there was a problem. We
 restarted the node but it has no effect. Then we cleaned data and commitlog
 contents and try to add the node to the cluster again but without result.

 Last try was to add the node with *auto_bootstrap : false* in order to
 add the node to the cluster manually but it messed up with the data. So we
 shut down the node and decommissioned it (with *nodetool removenode*).
 The whole cluster has been repaired and we stopped doing anything.



 Now, our cluster has only 7 French nodes in which we can’t add any node. The
 OPSCenter data has disapeared and we work without any information about how
 our cluster is running.



 You’ll find attached to this email our current configuration and a
 screenshot of our OPSCenter metric page.



 Do you have some idea on how to clean up the mess and get our cluster
 running cleanly before we start our migration (France to another country
 like described in the beginning of this email)?



 Thank you.



 Best regards,



 *David CHARBONNIER*

 Sysadmin

 T : +33 411 934 200

 david.charbonn...@rgsystem.com

 ZAC Aéroport

 125 Impasse Adam Smith

 34470 Pérols - France

 *www.rgsystem.com* http://www.rgsystem.com/










-- 
Fabien Rousseau


 aur...@yakaz.comwww.yakaz.com


Re: Migrate data to new cluster using datacenters?

2013-12-12 Thread Fabien Rousseau
Hi,

We did it once and it worked well.
Those two links should help (this is more or less what we've done) :
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/operations/ops_add_dc_to_cluster_t.html
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/operations/ops_decomission_dc_t.html




2013/12/12 Andrew Cooper andrew.coo...@nisc.coop

 Hello,

 We are in the process of isolating multiple applications currently running
 in one large cassandra cluster to individual smaller clusters.  Each
 application runs in its own keyspace.  In order to reduce/eliminate
 downtime for a migration, I was curious if anyone had attempted the
 following process to migrate data to a new cluster:

 1) Add new cluster nodes as a new datacenter to existing cluster
 2) Set RF for specific keyspace to non-zero for new cluster, use nodetool
 rebuild on new nodes to stream data
 3) Change application node connections to point to new cluster
 4) Set RF to 0 for original cluster (stop new writes from going to
 original cluster)
 5) Break connection between nodes so new nodes become a standalone
 cluster???  -  Is this possible? what would be the high level steps?

 If this is an extremely bad or misinformed idea, I would like to know that
 as well!

 I am aware of other tools available including sstableloader, etc, but this
 seemed like a more elegant solution, leveraging cassandra's active-active
 features.

 Thanks,

 -Andrew
 NISC




-- 
Fabien Rousseau


 aur...@yakaz.comwww.yakaz.com


Re: OOM while reading key cache

2013-11-14 Thread Fabien Rousseau
A few month ago, we've got a similar issue on 1.2.6 :
https://issues.apache.org/jira/browse/CASSANDRA-5706

But it has been fixed and did not encountered this issue anymore (we're
also on 1.2.10)


2013/11/14 olek.stas...@gmail.com olek.stas...@gmail.com

 Yes, as I wrote in first e-mail.  When I removed key cache file
 cassandra started without further problems.
 regards
 Olek

 2013/11/13 Robert Coli rc...@eventbrite.com:
 
  On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge t...@drillster.com
  wrote:
 
  I'm having the same problem, after upgrading from 1.2.3 to 1.2.10.
 
  I can remember this was a bug that was solved in the 1.0 or 1.1 version
  some time ago, but apparently it got back.
  A workaround is to delete the contents of the saved_caches directory
  before starting up.
 
 
  Yours is not the first report of this I've heard resulting from a 1.2.x
 to
  1.2.x upgrade. Reports are of the form I had to nuke my saved_caches or
 I
  couldn't start my node, it OOMED, etc..
 
  https://issues.apache.org/jira/browse/CASSANDRA-6325
 
  Exists, but doesn't seem  to be the same issue.
 
  https://issues.apache.org/jira/browse/CASSANDRA-5986
 
  Similar, doesn't seem to be an issue triggered by upgrade..
 
  If I were one of the posters on this thread, I would strongly consider
  filing a JIRA on point.
 
  @OP (olek) : did removing the saved_caches also fix your problem?
 
  =Rob
 
 




-- 
Fabien Rousseau


 aur...@yakaz.comwww.yakaz.com


Re: disappointed

2013-07-24 Thread Fabien Rousseau
Hi Paul,

Concerning large rows which are not compacting, I've probably managed to
reproduce your problem.
I suppose you're using collections, but also TTLs ?

Anyway, I opened an issue here :
https://issues.apache.org/jira/browse/CASSANDRA-5799

Hope this helps


2013/7/24 Christopher Wirt chris.w...@struq.com

 Hi Paul,

 ** **

 Sorry to hear you’re having a low point.

 ** **

 We ended up not using the collection features of 1.2. 

 Instead storing a compressed string containing the map and handling client
 side.

 ** **

 We only have fixed schema short rows so no experience with large row
 compaction.

 ** **

 File descriptors have never got that high for us. But, if you only have a
 couple physical nodes with loads of data and small ss-tables maybe they
 could get that high?

 ** **

 Only time I’ve had file descriptors get out of hand was then compaction
 got slightly confused with a new schema when I dropped and recreated
 instead of truncating.
 https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node
 fixed the issue.

 ** **

 ** **

 From my limited experience I think Cassandra is a dangerous choice for an
 young limited funding/experience start-up expecting to scale fast. We are a
 fairly mature start-up with funding. We’ve just spent 3-5 months moving
 from Mongo to Cassandra. It’s been expensive and painful getting Cassandra
 to read like Mongo, but we’ve made it J

 ** **

 ** **

 ** **

 ** **

 *From:* Paul Ingalls [mailto:paulinga...@gmail.com]
 *Sent:* 24 July 2013 06:01
 *To:* user@cassandra.apache.org
 *Subject:* disappointed

 ** **

 I want to check in.  I'm sad, mad and afraid.  I've been trying to get a
 1.2 cluster up and working with my data set for three weeks with no
 success.  I've been running a 1.1 cluster for 8 months now with no hiccups,
 but for me at least 1.2 has been a disaster.  I had high hopes for
 leveraging the new features of 1.2, specifically vnodes and collections.
 But at this point I can't release my system into production, and will
 probably need to find a new back end.  As a small startup, this could be
 catastrophic.  I'm mostly mad at myself.  I took a risk moving to the new
 tech.  I forgot sometimes when you gamble, you lose.

 ** **

 First, the performance of 1.2.6 was horrible when using collections.  I
 wasn't able to push through 500k rows before the cluster became unusable.
  With a lot of digging, and way too much time, I discovered I was hitting a
 bug that had just been fixed, but was unreleased.  This scared me, because
 the release was already at 1.2.6 and I would have expected something as
 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been
 addressed long before.  But gamely I grabbed the latest code from the 1.2
 branch, built it and I was finally able to get past half a million rows.
 

 ** **

 But, then I hit ~4 million rows, and a multitude of problems.  Even with
 the fix above, I was still seeing a ton of compactions failing,
 specifically the ones for large rows.  Not a single large row will compact,
 they all assert with the wrong size.  Worse, and this is what kills the
 whole thing, I keep hitting a wall with open files, even after dumping the
 whole DB, dropping vnodes and trying again.  Seriously, 650k open file
 descriptors?  When it hits this limit, the whole DB craps out and is
 basically unusable.  This isn't that many rows.  I have close to a half a
 billion in 1.1…

 ** **

 I'm now at a standstill.  I figure I have two options unless someone here
 can help me.  Neither of them involve 1.2.  I can either go back to 1.1 and
 remove the features that collections added to my service, or I find another
 data backend that has similar performance characteristics to cassandra but
 allows collections type behavior in a scalable manner.  Cause as far as I
 can tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I
 accomplished with 1.1….

 ** **

 Does anyone know why there are so many open file descriptors?  Any ideas
 on why a large row won't compact?

 ** **

 Paul




-- 
Fabien Rousseau
*
*
 aur...@yakaz.comwww.yakaz.com


Re: Performance issues with CQL3 collections?

2013-06-28 Thread Fabien Rousseau
IMHO, having many tombstones can slow down reads and writes in the
following cases :
 - For reads, it is slow if the requested slice contains many tombstones
 - For writes, it is is slower if the row in the memtable contains many
tombstones.  It's because, if the IntervalTree contains N intervals, and
one tombstone must be added, then a new IntervalTree must be recreated.

But it's true that writes are less impacted than reads.

Sylvain, if you need/want some help/info for CASSANDRA-5677, don't hesitate
to ask.



2013/6/28 Sylvain Lebresne sylv...@datastax.com

 As documented at http://cassandra.apache.org/doc/cql3/CQL.html#collections,
 the lists have 3 operations that require a read before a write (and should
 thus be avoided in performance sensitive code), namely setting and deleting
 by index, and removing by value. Outside of that, collections involves no
 read before writes.

 But, as you said, if you do overwrite a collection, the previous
 collection is removed (using a range tombstone) while the new one is added.
 This should have almost no impact on the insertion itself however (the
 tombstone is in the same internal mutation than the update itself, it's not
 2 operations). But yes, if you do often overwrite collections in the same
 partition, this might have some impact on reads due to CASSANDRA-5677, and
 we'll look at fixing that.

 So in theory collections should have no special impact on writes, at least
 nothing that is by design. If you do observe differently and have a way to
 reproduce, feel free to open a JIRA issue. But I'm afraid we'll need more
 than two guys on stackoverflow claims they've seem write performance
 degradation due to collection to get going.

 --
 Sylvain


 On Fri, Jun 28, 2013 at 7:30 AM, Theo Hultberg t...@iconara.net wrote:

 the thing I was doing was definitely triggering the range tombstone
 issue, this is what I was doing:

 UPDATE clocks SET clock = ? WHERE shard = ?

 in this table:

 CREATE TABLE clocks (shard INT PRIMARY KEY, clock MAPTEXT,
 TIMESTAMP)

 however, from the stack overflow posts it sounds like they aren't
 necessarily overwriting their collections. I've tried to replicate their
 problem with these two statements

 INSERT INTO clocks (shard, clock) VALUES (?, ?)
 UPDATE clocks SET clock = clock + ? WHERE shard = ?

 the first one should create range tombstones because it overwrites the
 the map on every insert, and the second should not because it adds to the
 map. neither of those seems to have any performance issues, at least not on
 inserts.

 and it's the slowdown on inserts that confuses me, both the stack
 overflow questioners say that they saw a drop in insert performance. I
 never saw that in my application, I just got slow reads (and Fabien's
 explanation makes complete sense for that). I don't understand how insert
 performance could be affected at all, and I know that for non-counter
 columns cassandra doesn't read before it writes, but is it the same for
 collections too? they are a bit special, but how special are they?

 T#


 On Fri, Jun 28, 2013 at 7:04 AM, aaron morton aa...@thelastpickle.comwrote:

 Can you provide details of the mutation statements you are running ? The
 Stack Overflow posts don't seem to include them.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 27/06/2013, at 5:58 AM, Theo Hultberg t...@iconara.net wrote:

 do I understand it correctly if I think that collection modifications
 are done by reading the collection, writing a range tombstone that would
 cover the collection and then re-writing the whole collection again? or is
 it just the modified parts of the collection that are covered by the range
 tombstones, but you still get massive amounts of them and its just their
 number that is the problem.

 would this explain the slowdown of writes too? I guess it would if
 cassandra needed to read the collection before it wrote the new values,
 otherwise I don't understand how this affects writes, but that only says
 how much I know about how this works.

 T#


 On Wed, Jun 26, 2013 at 10:48 AM, Fabien Rousseau fab...@yakaz.comwrote:

 Hi,

 I'm pretty sure that it's related to this ticket :
 https://issues.apache.org/jira/browse/CASSANDRA-5677

 I'd be happy if someone tests this patch.
 It should apply easily on 1.2.5  1.2.6

 After applying the patch, by default, the current implementation is
 still used, but modify your cassandra.yaml to add the following one :
 interval_tree_provider: IntervalTreeAvlProvider

 (Note that implementations should be interchangeable, because they
 share the same serializers and deserializers)

 Also, please note that this patch has not been reviewed nor intensively
 tested... So, it may not be production ready

 Fabien







 2013/6/26 Theo Hultberg t...@iconara.net

 Hi,

 I've seen a couple of people on Stack Overflow having problems with
 performance when

Re: Errors while upgrading from 1.1.10 version to 1.2.4 version

2013-06-28 Thread Fabien Rousseau
Hello,

Have a look at : https://issues.apache.org/jira/browse/CASSANDRA-5476


2013/6/28 Ananth Gundabattula agundabatt...@threatmetrix.com

 Hello Everybody,

 We were performing an upgrade of our cluster from 1.1.10 version to 1.2.4
 . We tested the upgrade process in a QA environment and found no issues.
 However in the production node, we faced loads of errors and had to abort
 the upgrade process.

 I was wondering how we ran into such a situation. The main difference
 between the QA environment and the production environments is the
 Replication Factor. In QA , RF=1 and in production RF=3.

 Example stack traces are  as seen on the other nodes are :
 http://pastebin.com/fSnMAd8q

 The other observation is that the node which was being upgraded is a seed
 node in the 1.1.10. We aborted right after the first node gave the above
 issues. Does this mean that there will be an application downtime required
 if we go for rolling upgrade on a live cluster from 1.1.10 version to 1.2.4
 version ?

 Regards,
 Ananth







-- 
Fabien Rousseau
*
*
 aur...@yakaz.comwww.yakaz.com


Re: Performance issues with CQL3 collections?

2013-06-26 Thread Fabien Rousseau
Hi,

I'm pretty sure that it's related to this ticket :
https://issues.apache.org/jira/browse/CASSANDRA-5677

I'd be happy if someone tests this patch.
It should apply easily on 1.2.5  1.2.6

After applying the patch, by default, the current implementation is still
used, but modify your cassandra.yaml to add the following one :
interval_tree_provider: IntervalTreeAvlProvider

(Note that implementations should be interchangeable, because they share
the same serializers and deserializers)

Also, please note that this patch has not been reviewed nor intensively
tested... So, it may not be production ready

Fabien







2013/6/26 Theo Hultberg t...@iconara.net

 Hi,

 I've seen a couple of people on Stack Overflow having problems with
 performance when they have maps that they continuously update, and in
 hindsight I think I might have run into the same problem myself (but I
 didn't suspect it as the reason and designed differently and by accident
 didn't use maps anymore).

 Is there any reason that maps (or lists or sets) in particular would
 become a performance issue when they're heavily modified? As I've
 understood them they're not special, and shouldn't be any different
 performance wise than overwriting regular columns. Is there something
 different going on that I'm missing?

 Here are the Stack Overflow questions:


 http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981


 http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236

 yours,
 Theo




-- 
Fabien Rousseau
*
*
 aur...@yakaz.comwww.yakaz.com