Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread Alain RODRIGUEZ
+1

I would guess a lot of C* clusters/tables have this option set to the
default value, and not many of them are having the need for reading so big
chunks of data.
I believe this will greatly limit disk overreads for a fair amount (a big
majority?) of new users. It seems fair enough to change this default value,
I also think 4.0 is a nice place to do this.

Thanks for taking care of this Ariel and for making sure there is a
consensus here as well,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg  a écrit :

> Hi,
>
> This would only impact new tables, existing tables would get their
> chunk_length_in_kb from the existing schema. It's something we record in a
> system table.
>
> I have an implementation of a compact integer sequence that only requires
> 37% of the memory required today. So we could do this with only slightly
> more than doubling the memory used. I'll post that to the JIRA soon.
>
> Ariel
>
> On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> >
> >
> > I think 16k is a better default, but it should only affect new tables.
> > Whoever changes it, please make sure you think about the upgrade path.
> >
> >
> > > On Oct 12, 2018, at 2:31 AM, Ben Bromhead  wrote:
> > >
> > > This is something that's bugged me for ages, tbh the performance gain
> for
> > > most use cases far outweighs the increase in memory usage and I would
> even
> > > be in favor of changing the default now, optimizing the storage cost
> later
> > > (if it's found to be worth it).
> > >
> > > For some anecdotal evidence:
> > > 4kb is usually what we end setting it to, 16kb feels more reasonable
> given
> > > the memory impact, but what would be the point if practically, most
> folks
> > > set it to 4kb anyway?
> > >
> > > Note that chunk_length will largely be dependent on your read sizes,
> but 4k
> > > is the floor for most physical devices in terms of ones block size.
> > >
> > > +1 for making this change in 4.0 given the small size and the large
> > > improvement to new users experience (as long as we are explicit in the
> > > documentation about memory consumption).
> > >
> > >
> > >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg 
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> This is regarding
> https://issues.apache.org/jira/browse/CASSANDRA-13241
> > >>
> > >> This ticket has languished for a while. IMO it's too late in 4.0 to
> > >> implement a more memory efficient representation for compressed chunk
> > >> offsets. However I don't think we should put out another release with
> the
> > >> current 64k default as it's pretty unreasonable.
> > >>
> > >> I propose that we lower the value to 16kb. 4k might never be the
> correct
> > >> default anyways as there is a cost to compression and 16k will still
> be a
> > >> large improvement.
> > >>
> > >> Benedict and Jon Haddad are both +1 on making this change for 4.0. In
> the
> > >> past there has been some consensus about reducing this value although
> maybe
> > >> with more memory efficiency.
> > >>
> > >> The napkin math for what this costs is:
> > >> "If you have 1TB of uncompressed data, with 64k chunks that's 16M
> chunks
> > >> at 8 bytes each (128MB).
> > >> With 16k chunks, that's 512MB.
> > >> With 4k chunks, it's 2G.
> > >> Per terabyte of data (pre-compression)."
> > >>
> > >>
> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
> > >>
> > >> By way of comparison memory mapping the files has a similar cost per
> 4k
> > >> page of 8 bytes. Multiple mappings makes this more expensive. With a
> > >> default of 16kb this would be 4x less expensive than memory mapping a
> file.
> > >> I only mention this to give a sense of the costs we are already
> paying. I
> > >> am not saying they are directly related.
> > >>
> > >> I'll wait a week for discussion and if there is consensus make the
> change.
> > >>
> > >> Regards,
> > >> Ariel
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >> --
> > > Ben Bromhead
> > > CTO | Instaclustr <https://www.instaclustr.com/>
> > > +1 650 284 9692
> > > Reliability at Scale
> > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Debug logging enabled by default since 2.2

2018-03-19 Thread Alain RODRIGUEZ
Hello,

I am not developing Cassandra, but I am using it actively and helping
people to work with it. My perspective might be missing some code
considerations and history as I did not go through the ticket where this
'debug' level was added by default. But here is a feedback after upgrading
a few clusters to Cassandra 2.2:

When upgrading a cluster to Cassandra 2.2, 'disable the debug logs' is in
my runbook. I mean, very often, when some cluster is upgraded to Cassandra
2.2 and has problems with performances, the 2 most frequent issues are:

- DEBUG level being turned on
- and / or dynamic snitching being enabled

This is especially true for high percentile (very clear on p99). Let's put
the dynamic snitch aside as it is not our topic here.

>From an operational perspective, I prefer to set the debug level to 'DEBUG'
when I need it than having, out of the box, something that is unexpected
and impact performances. Plus the debug level can be changed without
restarting the node, through 'JMX' or even using 'nodetool' now.

Also in most cases, the 'INFO' level is good enough for me to detect most
of the issues. I was even able to recreate a detailed history of events for
a customer recently, 'INFO' logs are already very powerful and complete I
believe (nice work on this by the way). Then monitoring is helping a lot
too. I did not have to use debug logs for a long time. It might happen, but
I will find my way to enable them.

Even though it feels great to be able to help people with that easily
because the cause is often the same and turning off the logs is a
low hanging fruit in C*2.2 clusters that have very nice results and is easy
to achieve, I would prefer people not to fall into these performances traps
in the first place. In my head, 'Debug' logs should be for debug purposes
(by opposition to 'always on'). It seems legit. I am surprised this brings
so many discussions I thought this was a common standard widely accepted,
and beyond Cassandra. That being said, it is good to see those exchanges
are happening, so the decision that will be taken will be a good one, I am
sure. I hope this comment will help, I have no other goal, for sure I am
not willing to feed a conflict but a talk and I hope no one felt offended
by this feedback. I believe this change was made aiming at
helping/improving things, but it turns out it is more of an annoyance than
truly helpful (my personal perspective).

I would +1 on making 'INFO' default again, but if someone is missing
information that should be in 'INFO'. If some informations are missing at
the 'INFO' level, why not add informations that should be at the 'INFO'
level there directly and keep log levels meaningful? Making sure we do not
bring the logs degrading performances from 'Debug' to 'Info' as much as we
can.

Hope this is useful,

C*heers,

---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-03-19 2:18 GMT+00:00 kurt greaves <k...@instaclustr.com>:

> On the same page as Michael here. We disable debug logs in production due
> to the performance impact. Personally I think if debug logging is necessary
> for users to use the software we're doing something wrong. Also in my
> experience, if something doesn't reproduce it will not get fixed. Debug
> logging helps, but I've never seen a case where an actual bug simply
> *doesn't* reproduce eventually, and I'm sure if this issue appears debug
> logging could be enabled after the fact for the relevant classes and
> eventually it will reoccur and we could solve the problem. I've never seen
> a user say no to helping debug a problem with patched jars/changes to their
> system (like logging). I'd much prefer we pushed that harder rather than
> just saying "Everyone gets debug logging!". I'm also really not sold that
> debug logging saves the day. To me it mostly just speeds up the
> identification process, it generally doesn't expose magical information
> that wasn't available before, you just had to think about it a bit more.
>
>
> In a way the real issue might be that we don’t have nightly performance
> > runs that would make an accidentally introduced debug statement obvious.
>
> This is an issue, but I don't think it's the *real* issue. As already
> noted, debug logging is for debugging, which normal users shouldn't have to
> think about when they are just operating the software. We shouldn't risk
> performance regressions just for developer convenience.
>
> On 19 March 2018 at 00:55, Ariel Weisberg <adwei...@fastmail.fm> wrote:
>
> > Hi,
> >
> > In a way the real issue might be that we don’t have nightly performance
> > runs that would make an accidentally introduced debug statement obvious.
> >
> > A log statement that runs once or more per read or write sho

Re: Cassandra 2017 Wrapup

2018-01-12 Thread Alain RODRIGUEZ
Hello,

That's good occasion for me (and I think other people around will mostly
agree) to thank you, Jeff, for all the weekly report / wrap up and all the
time you have been spending in the Dev and user mailing list and generally
to have Apache Cassandra moving forward. You are nowhere in your own stats
even though you are always everywhere around, sharing with people having
very variable levels of understanding of Apache Cassandra, with a lot of
patience and pedagogy.

Jeff, you forgot yourself somewhat in your list, but as you like numbers, I
see 'Jeff Jirsa' referenced in 200 threads in the user mailing list, about
100 threads in the dev list and I am not counting commits, review or
actions taken as a PMC, but I know you are there, really involved as well.
And statistics are just showing the volume, not the quality. Having you
around during some polemical talk to calm down things was also very helpful
to the community from my perspective.

So, for the huge amount of efficient work you did for Apache Cassandra and
its community this year, thank you too.

C*heers,

Alain

2017-12-22 21:56 GMT+00:00 DuyHai Doan :

> Thanks Jeff for the very comprehensive list of actions taken this year.
> Can't wait to put my hands on 4.0 once it's released
>
>
>
> On Fri, Dec 22, 2017 at 10:20 PM, Jeff Jirsa  wrote:
>
> > Happy holidays all,
> >
> > I imagine most people are about to disappear to celebrate holidays, so I
> > wanted to try to summarize the state of Cassandra dev for 2017, as I see
> > it. Standard disclaimers apply (this is my personal opinion, not that of
> my
> > employer, not officially endorsed by the Apache Cassandra PMC, or the
> ASF).
> >
> > Some quick stats about Cassandra development efforts in 2017 (using
> > imperfect git log | awk/sed counting, only looking at trunk, buyer
> beware,
> > it's probably off by a few):
> >
> > The first commit of 2017 was: Ben Manes, transforming the on-heap cache
> to
> > Caffeine (
> > https://github.com/apache/cassandra/commit/
> c607d76413be81a0e125c5780e068d
> > 7ab7594612
> > )
> > Alex Petrov removed the most code (~7500 lines, according to github)
> > Benjamin Lerer added the most code (~8000 lines, according to github)
> > We put to bed the tick/tock release cycle, but still cut 14 different
> > releases across 5 different branches.
> > We had a total of 136 different contributors, with 48 of those
> contributors
> > contributing more than one patch during the year.
> > We had a total of 47 different reviewers
> > There were 661 non-merge commits to trunk
> > There were 56 non-merge commits to docs/
> > We end the year with roughly 173 pending changes for 4.0
> > We resolved (either fixed or disqualified) 781 issues in JIRA
> > I count something like 273 email threads to dev@, and 903 email threads
> to
> > user@
> > The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex
> > Petrov, Blake Eggleston, and Philip Thompson as committers.
> > The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the
> > Apache Cassandra PMC
> >
> > At NGCC (which Eric and Gary managed to organize with the help of
> > Instaclustr sponsoring, an achievement in itself), we had people talk
> > about:
> > - Two different talks (from Apple and FB/Instagram). I'm struggling to
> > describe these in simple terms, they both sorta involving using hints and
> > changing some of the consistency concepts to help deal with latency /
> > durability / availability, especially in cross-DC workloads. Grouping
> these
> > together isn't really fair, but no one-email summary is going to be fair
> to
> > either of these talks. If you missed NGCC, I guess you get to wait for
> the
> > JIRAs / patches.
> > - A new storage engine (FB/Instagram) using RocksDB
> > - Some notes on using CDC at scale (and some proposed changes to make it
> > easier) from Uber (
> > https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf
> )
> > - Michael Shuler (Datastax /  Cassandra PMC / release master / etc) spent
> > some time talking about testing and CI.
> >
> > Some other big'ish development efforts worth mentioning (from personal
> > memory, perhaps the worst possible way to create such a list):
> > - We spent a fair amount of time talking about testing. Francois @
> > Instagram lead the way in codifying a new set of principles around
> testing
> > and quality (
> > https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486
> > cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E
> > / https://issues.apache.org/jira/browse/CASSANDRA-13497 ).
> > - We've also spent some time making tests work in CircleCI, which should
> > make life much easier for occasional contributors - no need to figure out
> > how to run tests in ASF Jenkins.
> > - The internode messaging rewrite to use async/netty is probably the
> single
> > largest that comes to mind. It went in earlier this year, and should make
> > it easier to have 

Way to unsubscribe from mailing lists

2017-04-25 Thread Alain RODRIGUEZ
Hi,

I am seeing a lot of people trying (and failing) to unsubscribe from
Cassandra mailing lists lately by sending an email to the list with
"unsubscribe" message either in the subject or the body of the email
instead of writing an email to dev-unsubscr...@cassandra.apache.org or
user-unsubscr...@cassandra.apache.org.

Last year I tried to template an answer for those messages, but it was
annoying to answer all the people (I am not a robot) and my message was
considered to be "too French". So I stopped. But at least it opened an
interesting discussion and suggestions were made but never applied
http://www.mail-archive.com/user@cassandra.apache.org/msg48355.html.

I know this 'spam' it is not a big deal, but seeing this kind of message
regularly for years is somehow frustrating me and this useless noise is
annoying, it makes more thread on busy mailing lists. Some unsubscribe
messages are even in the middle of other threads. On the other side I
believe it might be quite easy to 'fix'.

Should / could we have INFRA automatically unsubscribing people sending
those messages? I believe this would be the best solution, as more people
mentioned a year ago. I would like at least those messages to be filtered,
even that is a bit more selfish as it would not end the subscription for
the person sending the message, it would at least reduce the noise.

How do I make this actionable, should I create a JIRA? Is this message
enough for someone to take over? Should this go through a vote?

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


Re: DataStax and Cassandra

2016-11-07 Thread Alain RODRIGUEZ
Hi,

On a personal note, I’d like to thank those in this weekend’s threads who have
> tried to de-escalate tensions rather than inflame them.


I agree. Thanks. Let's try to all learn from that, please. I believe we can
disagree politely and respectfully. And we must do so. We can still express
anything we want to express. It's just a bit harder, it takes a bit longer
this way, but it's so much more efficient to have your real message
reaching people.

Jeff Jirsa’s diplomacy stands out to me as particularly mature.
>

True, that was my opinion as well while reading the threads. I also want to
take the chance to say that this applies to you as well Jonathan. IMHO,
your message is just what the community needed right now. So thanks for the
clarification...

... And for what you have done in the last years. As Aaron said at the
Summit, you, as an individual, have changed my life, and many other, in a
very good way. From my first message in the user mailing list you answered
to this last message in the dev list to preserve the community, even after
stepping down from the project chair. The amount, variety and quality of
work you have done and the way you are behaving are inspiring for me.

As has been said before, we’re all on the same team here.  Now let’s get back
> to making Apache Cassandra the best open source distributed database in
> the world!


Sounds good :-).

A community is nothing but what we bring to it as individuals. Looking
forward to working all together, with our disagreements, but peacefully, in
the same direction, once again.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-11-06 21:43 GMT+01:00 Jonathan Ellis <jbel...@gmail.com>:

> Hi all,
>
> There’s been some conversation and some acrimony kicked up by my recent
> blog post here
> <http://www.datastax.com/2016/11/serving-customers-serving-the-community>.
> I appreciate the conversation and regret the acrimony!
>
> Fundamentally I was trying to convey two complementary messages:
>
>
>1.
>
>DataStax wants to see Apache Cassandra thrive and will continue to
>contribute in multiple ways to make that happen, but at the same time
>2.
>
>DataStax will be placing more emphasis on DSE and more engineering
>effort behind it.
>
>
> It’s unfortunate that the timing here coincides with some regrettable
> actions by the Apache Board of Directors, but this change in emphasis is
> primarily driven by business factors unrelated to the ASF.  DataStax shares
> Apache’s commitment to community-led development independent of any single
> vendor.
>
> One friend pointed out to me that any vagueness can be interpreted as
> “weasel words” and turned into alarmist conjectures as to what this really
> means.  I gave several specifics in the post as to how DataStax will
> continue to contribute to Apache Cassandra, but maybe I can simplify
> things:
>
> This has been going on for months.  DataStax’s level of contribution moving
> forward will be nearly indistinguishable from our level in October and
> September.  If that was no cause for alarm then, I hope it will not be
> cause for alarm now that we have articulated how we are moving forward.
>
> To be explicit: DataStax engineers will continue to contribute code
> reviews, bug fixes, and selected new features to Apache Cassandra.  In a
> qualitative sense then, you could almost say that nothing has changed.
>
> On a personal note, I’d like to thank those in this weekend’s threads who
> have tried to de-escalate tensions rather than inflame them.  Jeff Jirsa’s
> diplomacy stands out to me as particularly mature.
>
> As has been said before, we’re all on the same team here.  Now let’s get
> back to making Apache Cassandra the best open source distributed database
> in the world!
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: Problems with cassandra on AWS

2016-07-11 Thread Alain RODRIGUEZ
Hi Kant.

it looks like ec2 instances cannot talk to each other using public IP's


"talk to each other" --> port 7000 (if not using ssl, 7001 if using it).
Make sure this port is open. From IP_1: telnet ip_2 7000 will tell you if
the port is opened.

"using public IP's" --> Are you using Ec2Snitch or Ec2MultiRegionSnitch.
The former uses exclusively private IP (listen_address, rpc_address and
broadcast_address empty / commented). The latter uses public ip for
broadcast_address. You need then to make sure ports are open between the
private IPs.

Having a look at 'nodetool status' before running operations should help
you determine the current status for your cluster. Nodes should be showing
UN status (Up, Normal).

C*heers,
-------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-11 10:02 GMT+02:00 Riccardo Ferrari <ferra...@gmail.com>:

> I would check your security group settings, you need to allow
> communication on cassandra ports (ie 9042,...)
>
> On Mon, Jul 11, 2016 at 8:17 AM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> xWell, I seem to recall that the private IP's are valid for
>> communications WITHIN one VPC. I assume you can log into one machine and
>> ping (or ssh) the others. If so, check that cassandra.yaml is not set to
>> listen on 127.0.0.1 (localhost).
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Sun, Jul 10, 2016 at 4:54 PM, Kant Kodali <k...@peernova.com> wrote:
>>
>>> Hi Guys,
>>>
>>> I installed a 3 node Cassandra cluster on AWS and my replication factor
>>> is
>>> 3. I am trying to insert some data into a table. I set the consistency
>>> level of QUORUM at a Cassandra Session level. It only inserts into one
>>> node
>>> and unable to talk to other nodes because it is trying to contact other
>>> nodes through private IP and obviously that is failing so I am not sure
>>> how
>>> to change settings in say cassandra.yaml or somewhere such that
>>> rpc_address
>>> in system.peers table is updated to public IP's? I tried changing the
>>> seeds
>>> to all public IP's that didn't work as it looks like ec2 instances cannot
>>> talk to each other using public IP's. any help would be appreciated!
>>>
>>> Thanks,
>>> kant
>>>
>>
>>
>


Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot

2016-05-11 Thread Alain RODRIGUEZ
Hi Michaels :-),

My guess is this ticket will be closed with a "Won't Fix" resolution.

Cassandra 2.0 is no longer supported and I have seen tickets being rejected
like CASSANDRA-10510 <https://issues.apache.org/jira/browse/CASSANDRA-10510>
.

Would you like to upgrade to 2.1.last and see if you still have the issue?

About your issue, do you stop your node using a command like the following
one?

nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool
disablegossip && sleep 10 && nodetool drain && sleep 10 && sudo service
cassandra stop

or even flushing:

nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool
disablegossip && sleep 10 && nodetool flush && nodetool drain && sleep 10
&& sudo service cassandra stop

Are commitlogs empty when you start cassandra?

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-11 5:35 GMT+02:00 Michael Fong <michael.f...@ruckuswireless.com>:

> Hi,
>
> Thanks for your recommendation.
> I also opened a ticket to keep track @
> https://issues.apache.org/jira/browse/CASSANDRA-11748
> Hope this could brought someone's attention to take a look. Thanks.
>
> Sincerely,
>
> Michael Fong
>
> -Original Message-
> From: Michael Kjellman [mailto:mkjell...@internalcircle.com]
> Sent: Monday, May 09, 2016 11:57 AM
> To: dev@cassandra.apache.org
> Cc: u...@cassandra.apache.org
> Subject: Re: Cassandra 2.0.x OOM during startsup - schema version
> inconsistency after reboot
>
> I'd recommend you create a JIRA! That way you can get some traction on the
> issue. Obviously an OOM is never correct, even if your process is wrong in
> some way!
>
> Best,
> kjellman
>
> Sent from my iPhone
>
> > On May 8, 2016, at 8:48 PM, Michael Fong <
> michael.f...@ruckuswireless.com> wrote:
> >
> > Hi, all,
> >
> >
> > Haven't heard any responses so far, and this isue has troubled us for
> quite some time. Here is another update:
> >
> > We have noticed several times that The schema version may change after
> migration and reboot:
> >
> > Here is the scenario:
> >
> > 1.   Two node cluster (1 & 2).
> >
> > 2.   There are some schema changes, i.e. create a few new
> columnfamily. The cluster will wait until both nodes have schema version in
> sync (describe cluster) before moving on.
> >
> > 3.   Right before node2 is rebooted, the schema version is
> consistent; however, after ndoe2 reboots and starts servicing, the
> MigrationManager would gossip different schema version.
> >
> > 4.   Afterwards, both nodes starts exchanging schema  message
> indefinitely until one of the node dies.
> >
> > We currently suspect the change of schema is due to replying the old
> entry in commit log. We wish to continue dig further, but need experts help
> on this.
> >
> > I don't know if anyone has seen this before, or if there is anything
> wrong with our migration flow though..
> >
> > Thanks in advance.
> >
> > Best regards,
> >
> >
> > Michael Fong
> >
> > From: Michael Fong [mailto:michael.f...@ruckuswireless.com]
> > Sent: Thursday, April 21, 2016 6:41 PM
> > To: u...@cassandra.apache.org; dev@cassandra.apache.org
> > Subject: RE: Cassandra 2.0.x OOM during bootstrap
> >
> > Hi, all,
> >
> > Here is some more information on before the OOM happened on the rebooted
> node in a 2-node test cluster:
> >
> >
> > 1.   It seems the schema version has changed on the rebooted node
> after reboot, i.e.
> > Before reboot,
> > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326
> > MigrationManager.java (line 328) Gossiping my schema version
> > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122
> > MigrationManager.java (line 328) Gossiping my schema version
> > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> >
> > After rebooting node 2,
> > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java
> > (line 328) Gossiping my schema version
> > f5270873-ba1f-39c7-ab2e-a86db868b09b
> >
> >
> >
> > 2.   After reboot, both nods repeatedly send MigrationTask to each
> other - we suspect it is related to the schema version (Digest) mismatch
> after Node 2 rebooted:
> > The node2  keeps submitting the migration task over 100+ times to the
> other node.
> > INFO [GossipStag

Re: How to measure the write amplification of C*?

2016-03-10 Thread Alain RODRIGUEZ
Hi Dikang,

I am not sure about what you call "amplification", but as sizes highly
depends on the structure I think I would probably give it a try using CCM (
https://github.com/pcmanus/ccm) or some test cluster with 'production like'
setting and schema. You can write a row, flush it and see how big is the
data cluster-wide / per node.

Hope this will be of some help.

C*heers,
-------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-10 7:18 GMT+01:00 Dikang Gu <dikan...@gmail.com>:

> Hello there,
>
> I'm wondering is there a good way to measure the write amplification of
> Cassandra?
>
> I'm thinking it could be calculated by (size of mutations written to the
> node)/(number of bytes written to the disk).
>
> Do we already have the metrics of "size of mutations written to the node"?
> I did not find it in jmx metrics.
>
> Thanks
>
> --
> Dikang
>
>