Re: binary protocol server side sockets

2014-04-11 Thread Chris Lohfink
id = 1 -- *Chris Lohfink* Engineer 415.663.6738 | Skype: clohfink.blackbirdit *Blackbird **[image: favicon]* 775.345.3485 | www.blackbirdIT.com http://www.blackbirdit.com/ *Formerly PalominoDB/DriveDev* On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote

Re: GC histogram analysis

2014-04-16 Thread Chris Lohfink
You can take a heap dump and find out who has references to it. Can find out more which column family they are from. Do you have a lot of tombstones or have data thats over written a lot or and doing a ton of reads? Maybe wide rows that your querying across or using filtering? Reads could

Re: Embedded Cassandra Performance

2014-04-16 Thread Chris Lohfink
recommend against it. --- Chris Lohfink On Apr 16, 2014, at 10:13 AM, Sávio Teles savio.te...@lupa.inf.ufg.br wrote: Is it advisable to run the embedded Cassandra in production? 2014-04-16 12:08 GMT-03:00 Sávio Teles savio.te...@lupa.inf.ufg.br: I'm running a cluster with Cassandra and my app

Re: Read Entire row from cassandra

2014-04-17 Thread Chris Lohfink
The java client will automatically page the row for you. If your columns are large may want to tweak the .setFetchSize(##) on your Statement. --- Chris Lohfink On Apr 17, 2014, at 12:36 PM, abhinav chowdary abhinav.chowd...@gmail.com wrote: We have one use case where we need to pull

Re: fixed size collection possible?

2014-04-22 Thread Chris Lohfink
inserted so might have to do some client side filtering to show the latest only using the created field. --- Chris Lohfink On Apr 22, 2014, at 1:51 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, look at the collection type support in cql3, e.g http://www.datastax.com/documentation/cql/3.0/cql

Re: Doubt

2014-04-22 Thread Chris Lohfink
in mind if serializing data though you will have to always maintain code that will be able to read old versions, it can become very complex and lead to weird bugs. --- Chris Lohfink On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan ja...@zohocorp.com wrote: Dear All, We have a requirement to store

Re: nodetool hangs

2014-04-24 Thread Chris Lohfink
name in /etc/cassandra/cassandra-env.sh) and a random port. Likely the 2nd connection is whats timing out. JMX makes firewalls and sysadmins very frustrated :) --- Chris Lohfink On Apr 24, 2014, at 7:05 AM, Jacob Rhoden jacob.rho...@me.com wrote: I’ve done an install on an amazon instance

Re: nodetool hangs

2014-04-24 Thread Chris Lohfink
Wow… wheres this been all my life. I don’t see why this can’t be set by default? https://issues.apache.org/jira/browse/CASSANDRA-7087 --- Chris Lohfink On Apr 24, 2014, at 11:48 AM, Steven A Robenalt srobe...@stanford.edu wrote: There's a little-known change in the way JMX uses ports

Re: Recommended Approach for Config Changes

2014-04-25 Thread Chris Lohfink
Yes. Some changes you can manually have take affect without a restart (ie compactionthroughput, things settable from jmx). There is also config changes you cant really make like switching the snitch and such without a big todo. --- Chris On Apr 25, 2014, at 8:53 AM, Phil Burress

Re: : Read a negative frame size (-2113929216)!

2014-04-25 Thread Chris Lohfink
Did you send an enormous write or batch write and it wrapped? Or is your client trying to use non-framed transport? Chris On Apr 25, 2014, at 2:50 PM, Vivek Mishra mishra.v...@gmail.com wrote: This is what i am getting with Cassandra 2.0.7 with Thrift. Caused by:

Re: : Read a negative frame size (-2113929216)!

2014-04-25 Thread Chris Lohfink
what client are you using? On Apr 25, 2014, at 3:01 PM, Vivek Mishra mishra.v...@gmail.com wrote: It's a simple cql3 query to create keyspace. -Vivek On Sat, Apr 26, 2014 at 1:28 AM, Chris Lohfink clohf...@blackbirdit.com wrote: Did you send an enormous write or batch write

Re: : Read a negative frame size (-2113929216)!

2014-04-26 Thread Chris Lohfink
Try running with -version:class added to your jvm options on your client. Can you give the output for the jar files for thrift/cassandra? (i.e. cassandra, cassandra-thrift, and thrift lib) --- Chris Lohfink On Apr 25, 2014, at 11:30 PM, Vivek Mishra mishra.v...@gmail.com wrote

Re: How long are expired values actually returned?

2014-05-12 Thread Chris Lohfink
That is not expected. What client are you using and how are you setting the ttls? What version of Cassandra? --- Chris Lohfink On May 8, 2014, at 9:44 AM, Sebastian Schmidt isib...@gmail.com wrote: Hi, I'm using the TTL feature for my application. In my tests, when using a TTL of 5

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-13 Thread Chris Lohfink
for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired. Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones. --- Chris Lohfink On May 6, 2014, at 7:55 PM, Kevin Burton bur

Re: What does the rate signify for latency in the JMX Metrics?

2014-05-16 Thread Chris Lohfink
://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf --- Chris Lohfink On May 7, 2014, at 1:00 PM, Chris Burroughs chris.burrou...@gmail.com wrote: They are exponential decaying moving averages (like Unix load averages) of the number of events per unit of time. http://wiki.apache.org/cassandra

Re: Mutation messages dropped

2014-05-16 Thread Chris Lohfink
Shameless plug: http://www.evidencebasedit.com/guide-to-cassandra-thread-pools/#droppable On May 15, 2014, at 7:37 PM, Mark Reddy mark.re...@boxever.com wrote: Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for further details. Mark On Fri, May 9, 2014 at

Re: Tombstones

2014-05-16 Thread Chris Lohfink
It will delete them after gc_grace_seconds (set per table) and a compaction. --- Chris Lohfink On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Chris Lohfink
There does seem to be some effort trying to encourage others - DataStax had some talks explaining how to contribute. This year there is even a extra bootcamp http://learn.datastax.com/CassandraSummitBootcampApplication.html On May 16, 2014, at 9:47 AM, Peter Lin wool...@gmail.com wrote:

Re: Couter column family performance problems

2014-05-16 Thread Chris Lohfink
slow. if it shows large pending/blocked in nodetool tpstats might be overrunning your capacity. --- Chris Lohfink On May 12, 2014, at 5:03 PM, Batranut Bogdan batra...@yahoo.com wrote: Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d

Re: high pending compactions

2014-06-09 Thread Chris Lohfink
Bean: org.apache.cassandra.db.CompactionManager also nodetool compactionstats gives you how many are in the queue + estimate of how many will be needed. in 1.1 you will OOM far before you hit the limit,. In theory though, the compaction executor is a little special cased and will actually

Re: HA Proxy

2014-06-27 Thread Chris Lohfink
Hector is same way, if any node is slow to responds, times out or dies hector will remove it from the pool leaving making it look like cluster dead. The entire fault tolerant part of cassandra would be lost. Chris On Jun 27, 2014, at 11:00 AM, Michael Dykman mdyk...@gmail.com wrote: NO,

Re: UnavailableException

2014-07-11 Thread Chris Lohfink
What replication strategy are you using? if using NetworkTopolgyStrategy double check that your DC names match up (case sensitive) Chris On Jul 11, 2014, at 9:38 AM, Ruchir Jha ruchir@gmail.com wrote: Here's the complete stack trace:

Re: UnavailableException

2014-07-14 Thread Chris Lohfink
WITH replication = { 'class': 'NetworkTopologyStrategy', 'datacenter1': '3' }; On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink clohf...@blackbirdit.com wrote: What replication strategy are you using? if using NetworkTopolgyStrategy double check that your DC names match up (case

Re: UnavailableException

2014-07-14 Thread Chris Lohfink
mean by check that your DC names match up CREATE KEYSPACE prod WITH replication = { 'class': 'NetworkTopologyStrategy', 'datacenter1': '3' }; On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink clohf...@blackbirdit.com wrote: What replication strategy are you using? if using

Re: high pending compactions

2014-07-15 Thread Chris Lohfink
many times in middle of night. concurrent compactors will likely be to low (depending on number of cores). --- Chris Lohfink On Jul 14, 2014, at 7:31 PM, Greg Bone gbon...@gmail.com wrote: I'm looking into creation of monitoring thresholds for cassandra to report on its health. Does it make

Re: MemtablePostFlusher and FlushWriter

2014-07-15 Thread Chris Lohfink
The MemtablePostFlusher is also used for flushing non-cf backed (solr) indexes. Are you using DSE and solr by chance? Chris On Jul 15, 2014, at 5:01 PM, horschi hors...@gmail.com wrote: I have seen this behavour when Commitlog files got deleted (or permissions were set to read only).

Re: How to maintain the N-most-recent versions of a value?

2014-07-17 Thread Chris Lohfink
) and accesses storage more directly, which is similar to hbases. You have your column family foo, then just use a composite column to store family, qualifier, and version in column name with value of column being value. row key is your row key. --- Chris Lohfink On Jul 17, 2014, at 6:32 PM, Clint

Re: Error while converting data from sstable to json with sstable2json

2014-07-30 Thread Chris Lohfink
Its stored as bytes, depending completely on what is given to it. If I were to guess I would say this looks like a composite partition key of utf8 values separated with control character (0) and a length of the next key. i.e. PRIMARY KEY ((uid, vendor, x), timestamp, y) Chris Lohfink On Jul

Re: do Cassandra generate a event or log containing key value of column when a column expires due to TTL

2014-08-22 Thread Chris Lohfink
Few options I can think of, probably some better ideas out there. These mostly depending on size of data and how frequently updated. 1) a map reduce or spark job to filter out non-empty rows 2) add some logging and do a custom build of cassandra (ie in removeDeletedCF of ColumnFamilyStore) and

Re: MapReduce Integration?

2014-08-26 Thread Chris Lohfink
There is a Bring your own Hadoop for DSE as well: http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/byoh/byohIntro.html Can also run hadoop against your backup/snapshots: https://github.com/Netflix/aegisthus https://github.com/fullcontact/hadoop-sstable Chris On

Re: Question about MemoryMeter liveRatio

2014-08-26 Thread Chris Lohfink
at real time. This is used to determine how much memory a memtable is taking up and how often to flush it. --- Chris Lohfink On Aug 26, 2014, at 12:20 PM, Leleu Eric eric.le...@worldline.com wrote: Hi, I’m trying to understand what is the liveRatio and if I have to care about it. I

Re: How often are JMX Cassandra metrics reset?

2014-08-28 Thread Chris Lohfink
/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.2.0/com/yammer/metrics/core/Timer.java?av=f Chris Lohfink On Aug 28, 2014, at 5:39 PM, Donald Smith donald.sm...@audiencescience.com wrote: The metrics OneMinuteRate, FIveMinuteRate, FifteenMinuteRate, and MeanRate are NOT lifetime

Re: Cassandra JBOD disk configuration

2014-09-09 Thread Chris Lohfink
It can get really unbalanced with STCS. Whats more is even if there was a disk that could fit the 600gb sstable it doesn't pay attention to space (first) so may pick the 75% full one over the 10% one. Its a better idea to use LCS with it unless data model really needs it in which case monitor

Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread Chris Lohfink
How much memory does your system have? How much memory is system utilizing before starting Cassandra (use command free)? What are the heap setting it tries to use? Chris On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote: It's during the startup. I tried to upgrade cassandra

Re: Trying to understand cassandra gc logs

2014-09-16 Thread Chris Lohfink
-env.sh as well to simplify things a little and make it parsable by gc log visualization tools --- Chris Lohfink On Sep 15, 2014, at 9:40 PM, Donald Smith donald.sm...@audiencescience.com wrote: I understand that cassandra uses ParNew GC for New Gen and CMS for Old Gen (tenured). I’m trying

Re: hs_err_pid3013.log, out of memory?

2014-09-17 Thread Chris Lohfink
) that you have maxed out of instead of memory. --- Chris Lohfink On Sep 17, 2014, at 8:35 PM, Yatong Zhang bluefl...@gmail.com wrote: @Chris Lohfink I have 16G memory per node, all the other settings are default @J. Ryan Earl I am not sure. I am using the default settings. But I've found out

Re: ava.lang.OutOfMemoryError: unable to create new native thread

2014-09-17 Thread Chris Lohfink
priority 00 Max realtime timeout unlimitedunlimitedus --- Chris Lohfink On Sep 17, 2014, at 6:09 PM, Yatong Zhang bluefl...@gmail.com wrote: My sstable size is 192MB. I removed some data directories to reduce the data

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-17 Thread Chris Lohfink
of nodetool cfstats), may be worth including g to break it up more - but I dont know enough about your data model. --- Chris Lohfink On Sep 17, 2014, at 4:53 PM, Mohammed Guller moham...@glassbeam.com wrote: Thank you all for your responses. Alex – Instance (ephemeral) SSD Ben

Re: CPU consumption of Cassandra

2014-09-22 Thread Chris Lohfink
on the select/read is marked as RUNNABLE but its really more of a wait state that may throw some profilers off, it may be a red haring. --- Chris Lohfink On Sep 22, 2014, at 11:39 AM, Leleu Eric eric.le...@worldline.com wrote: Hi, I’m currently testing Cassandra 2.0.9 (and since the last

Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
35 isn't that high really in some scenarios (ie, theres a lot of column families), is it continuing to climb or does it drop down shortly after? --- Chris Lohfink On Sep 22, 2014, at 7:57 PM, arun sirimalla arunsi...@gmail.com wrote: I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE

Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
Whats the output of 'nodetool compactionstats'? Is concurrent_compactors not set in your cassandra.yaml? Any Exception or Error 's in the system.log or output.log? --- Chris Lohfink On Sep 22, 2014, at 9:50 PM, Arun arunsi...@gmail.com wrote: Its constant since 4 hours. Remaining nodes

Re: CPU consumption of Cassandra

2014-09-23 Thread Chris Lohfink
want to get more out of these systems can do some tuning probably, enable trace to see whats actually the bottleneck. Collections will very likely hurt more then help. --- Chris Lohfink On Sep 23, 2014, at 9:39 AM, Leleu Eric eric.le...@worldline.com wrote: I tried to run “cassandra-stress

Re: CPU consumption of Cassandra

2014-09-23 Thread Chris Lohfink
with yourkit) can give more exposure to the bottleneck. Id run test from separate system first. --- Chris Lohfink On Sep 23, 2014, at 12:48 PM, Leleu Eric eric.le...@worldline.com wrote: First of all, Thanks for your help ! :) Here is some details : With RF=N=2 your essentially testing

Re: Exploring Simply Queueing

2014-10-05 Thread Chris Lohfink
consumers from reading same message off of a queue? You mention in docs you will address it at a later point in time but its kinda a biggy. Big lock batch reads like astyanax recipe? --- Chris Lohfink On Oct 5, 2014, at 6:03 PM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, I have

Re: tuning concurrent_reads param

2014-10-29 Thread Chris Lohfink
thread pool (nodetool tpstats) you can see if they are actually all busy or not. If its near 32 (or whatever you set it at) all the time it may be a bottleneck. --- Chris Lohfink On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Hi, looking at the docs, the default value

Re: Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Chris Lohfink
(if have network for it) and compaction throughput if you end up with IO to spare. I generally would not recommend putting multiple C* instances on a single box. --- Chris Lohfink On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote: I’m curious what people are doing

Re: query tracing

2014-11-07 Thread Chris Lohfink
It saves a lot of information for each request thats traced so there is significant overhead. If you start at a low probability and move it up based on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin

Re: What actually causing java.lang.OutOfMemoryError: unable to create new native thread

2014-11-10 Thread Chris Lohfink
if your using 64 bit, check output of: cat /proc/{cassandra pid}/limits some older linux kernels wont work with above so if it doesnt exist check the ulimit -a output for the cassandra user. max processes per user may be your issue as well. --- Chris Lohfink On Mon, Nov 10, 2014 at 11:21 AM

Re: Programmatic Cassandra version detection/extraction

2014-11-13 Thread Chris Lohfink
There is a ReleaseVersion attribute in the org.apache.cassandra.db:StorageService bean --- Chris Lohfink On Wed, Nov 12, 2014 at 5:57 PM, Michael Shuler mich...@pbandjelly.org wrote: On 11/12/2014 04:58 PM, Michael Shuler wrote: On 11/12/2014 04:44 PM, Otis Gospodnetic wrote

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Chris Lohfink
statements, running cql over thrift is far from optimal. I would recommend using the cassandra-stress tool if you want to stress test Cassandra (and not your code) http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema === Chris Lohfink On Sun, Dec 7, 2014 at 9:48 PM, 孔

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Chris Lohfink
instead of own stuff if you want to test cassandra and not your code. === Chris Lohfink On Mon, Dec 8, 2014 at 4:57 AM, 孔嘉林 kongjiali...@gmail.com wrote: Thanks Chris. I run a *client on a separate* AWS *instance from* the Cassandra cluster servers. At the client side, I create 40 or 50

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
What about 15 gb? not ok :) don't let a single partition get to 1gb, 100's of mb should be when flares are going up. The main reasoning is compactions would be horrifically slow and there will be a lot of gc pain. Bringing the time bucket to be by day will probably be sufficient. It would take

Re: High GC activity on node with 4TB on data

2015-02-09 Thread Chris Lohfink
- number of tombstones - how can I reliably find it out? https://github.com/spotify/cassandra-opstools https://github.com/cloudian/support-tools If not getting much compression it may be worth trying to disable it, it may contribute but its very unlikely that its the cause of the gc pressure

Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
Your cluster is probably having issues with compactions (with STCS you should never have this many). I would probably punt with OpsCenter/rollups60. Turn the node off and move all of the sstables off to a different directory for backup (or just rm if you really don't care about 1 minute metrics),

Re: nodetool status shows large numbers of up nodes are down

2015-02-10 Thread Chris Lohfink
Are you hitting long GCs on your nodes? Can check gc log or look at cassandra log for GCInspector. Chris On Tue, Feb 10, 2015 at 1:28 PM, Cheng Ren cheng@bloomreach.com wrote: Hi Carlos, Thanks for your suggestion. We did check the NTP setting and clock, and they are all working

Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
nodetool compact. If that goes successfully, then would it be safe to chalk the lack of compaction on this table in the past up to 2.1.2 problems? ~ Paul Nickerson On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com wrote: Your cluster is probably having issues

Re: How to remove obsolete error message in Datastax Opscenter?

2015-02-09 Thread Chris Lohfink
Restarting opscenter service will get rid of it. Chris On Mon, Feb 9, 2015 at 3:01 AM, Björn Hachmann bjoern.hachm...@metrigo.de wrote: Good morning, unfortunately my last rolling restart of our Cassandra cluster issued from OpsCenter (5.0.2) failed. No big deal, but since then OpsCenter is

Re: Really high read latency

2015-03-23 Thread Chris Lohfink
Compacted partition maximum bytes: 36904729268 thats huge... 36gb rows are gonna cause a lot of problems, even when you specify a precise cell under this it still is going to have an enormous column index to deserialize on every read for the partition. As mentioned above, you should include

Re: cfstats ERROR

2015-06-20 Thread Chris Lohfink
Issue here: https://issues.apache.org/jira/browse/CASSANDRA-9580 Fixed in 2.1.7. Chris On Sat, Jun 20, 2015 at 1:40 PM, 曹志富 cao.zh...@gmail.com wrote: error: /home/ant/apache-cassandra-2.1.6/bin/../data/data/blogger/edgestore/blogger-edgestore-tmplink-ka-146100-Data.db -- StackTrace --

Re: Error Code

2015-10-29 Thread Chris Lohfink
It means a response (opcode 8) message couldn't be decoded. What driver are you using? What version? What version of C*? Chris On Thu, Oct 29, 2015 at 9:19 AM, Eduardo Alfaia wrote: > yes, but what does it mean? > > On 29 Oct 2015, at 15:18, Kai Wang

Re: Last two metrics of cfstats

2015-09-02 Thread Chris Lohfink
Its number of cells and tombstones seen on the partitions during reads. Just ignore the "last five minutes" part though since thats incorrect. It being zero probably means theres been no actual reads have been off of disk on that node. Might want to check if "Local read count" is non-zero which

Re: confusion about nodetool cfstats

2015-09-10 Thread Chris Lohfink
DSE you can use the performance service to get some of the metrics (including aggregates across dc, keyspace, cluster etc) from CQL. Chris Lohfink On Thu, Sep 10, 2015 at 9:38 PM, Shuo Chen <chenatu2...@gmail.com> wrote: > Sorry to send the previous message. > > I want to monit

Re: Infinite loop in SliceQueryFilter

2015-12-04 Thread Chris Lohfink
May just be going over a lot of data. Does output of 'nodetool cfstats' show large partitions? (partition maximum bytes). "collecting 1 of 2147483647" is suspicious. Are your queries using ALLOW FILTERING or have very high limits? If trying to read 2 billion entries in 1 query you will have memory

Re: sstabledump failing for system keyspace tables

2016-06-11 Thread Chris Lohfink
related to https://issues.apache.org/jira/browse/CASSANDRA-11330, most of the system tables will work but batches are kinda special cased and uses the localpartitioner (see:

Re: CRT

2016-02-23 Thread Chris Lohfink
Check out http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen. You can run it yourself to test as well. Chris On Tue, Feb 23, 2016 at 7:02 PM, Rakesh Kumar wrote: > https://www.aphyr.com/posts/294-jepsen-cassandra > > How much of this is still valid in ver

Re: Estimated key count from nodetool tablestats

2016-01-24 Thread Chris Lohfink
index and could be off by a lot in wide rows/updated/many sstable use cases. --- Chris Lohfink On Sun, Jan 24, 2016 at 6:32 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Does the nodetool tablestats output line for "Number of keys (estimate)" > indicate partition

Re: opscenter doesn't work with cassandra 3.0

2016-01-26 Thread Chris Lohfink
DataStax has a free program for startups http://www.datastax.com/datastax-enterprise-for-startups On Tue, Jan 26, 2016 at 9:42 AM, Otis Gospodnetić < otis.gospodne...@gmail.com> wrote: > Hi Duyhai, > > SPM is not free, but there is a free plan, plus we have special pricing > for startups,

Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

2016-04-11 Thread Chris Lohfink
Where do you get the ~1ms latency between AZs? Comparing a short term average to a 99th percentile isn't very fair. "Over the last month, the median is 2.09 ms, 90th percentile is 20ms, 99th percentile is 47ms." - per

Re: Approximate row count

2016-07-27 Thread Chris Lohfink
the number of keys are the number of *partition keys, *not row keys. You have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you have some wide partitions that contain many of your rows. Chris Lohfink On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <l...@getadmiral.com> wrote

Re: a solution of getting cassandra cross-datacenter latency at a certain time

2016-08-08 Thread Chris Lohfink
bins range during the period. Also can wait for CASSANDRA-11752 <https://issues.apache.org/jira/browse/CASSANDRA-11752> for the a "recent" histogram (although would need to apply it to this histogram as well). Chris Lohfink On Mon, Aug 8, 2016 at 8:50 AM, Ryan Svihla <r...

Re: Hintedhandoff mutation

2016-08-17 Thread Chris Lohfink
Probably question better suited for the dev@ list. But I afaik the answer is there is no way to tell the difference, but probably safe to look at the created time, HHs tend to be older. Chris On Wed, Aug 17, 2016 at 5:02 AM, Stone Fang wrote: > Hi All, > > I want to

Re: partition sizes reported by nodetool tablehistograms

2017-02-24 Thread Chris Lohfink
Its the decompressed size of the partitions. Each sstable has stats component that contains histograms for the size and number of columns in the partitions (among other things, can see with sstablemetadata tool), tablehistograms merges it for each sstable and gives the results. Chris On Fri, Feb

Re: How to get information of each read/write request?

2016-08-30 Thread Chris Lohfink
Running a query with trace (`TRACING ON` in cqlsh) can give you a lot of the information for an individual request. There has been a ticket to track time in queue (https://issues.apache.org/jira/browse/CASSANDRA-8398) but no ones worked on it yet. Chris On Tue, Aug 30, 2016 at 12:20 PM, Jun Wu

Re: system_distributed.repair_history table

2016-10-05 Thread Chris Lohfink
The only current solution is to truncate it periodically. I opened https://issues.apache.org/jira/browse/CASSANDRA-12701 about it if interested in following On Wed, Oct 5, 2016 at 4:23 PM, Saladi Naidu wrote: > We are seeing following warnings in system.log, As >

Re: system_distributed.repair_history table

2016-10-06 Thread Chris Lohfink
It makes sense to periodically truncate as it is > only for debugging purposes > > Naidu Saladi > > > On Wednesday, October 5, 2016 8:03 PM, Chris Lohfink <clohfin...@gmail.com> > wrote: > > > The only current solution is to truncate it periodically. I o

Re: repair_history maintenance

2016-09-23 Thread Chris Lohfink
Probably should just periodically truncate/clear snapshots when gets too big (will probably take months before noticeable). I opened https://issues.apache.org/jira/browse/CASSANDRA-12701 for discussion on if it should use TTLs Chris On Thu, Sep 22, 2016 at 1:28 PM, sfesc...@gmail.com

Re: metrics not resetting after running proxyhistograms or cfhistograms

2016-10-25 Thread Chris Lohfink
That behavior went away with 2.2. https://issues.apache.org/jira/browse/CASSANDRA-11752 adds decay to it to make it recent data which is much better then just reseting on reads. Chris On Tue, Oct 25, 2016 at 2:06 PM, Andrew Bialecki < andrew.biale...@klaviyo.com> wrote: > We're running 3.6.

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Chris Lohfink
count(*) actually pages through all the data. So a select count(*) without a limit would be expected to cause a lot of load on the system. The hit is more than just IO load and CPU, it also creates a lot of garbage that can cause pauses slowing down the entire JVM. Some details here:

Re: Java GC pauses, reality check

2016-11-25 Thread Chris Lohfink
No tuning will eliminate gcs. 20-30 seconds is horrific and out of the ordinary. Most likely implementing antipatterns and/or poorly configured. Sub 1s is realistic but with some workloads still may require some tuning to maintain. Some workloads are very unfriendly to GCs though (ie heavy

Re: Help

2017-01-09 Thread Chris Lohfink
Do you have any monitoring setup around garbage collections? A GC + network latency > write timeout will cause intermittent hints. On Sun, Jan 8, 2017 at 10:30 PM, Anshu Vajpayee wrote: > Gossip shows - all nodes are up. > > But when we perform writes , coordinator

Re: system_auth replication strategy

2017-04-01 Thread Chris Lohfink
You should use a network topology strategy with high RF in each DC or something like the everywhere strategy. You should never really use SimpleStrategy, especially if you have multiple DCs and are using LOCAL or EACH consistencies. Its more for test and dev setups then a prod environment.

Re: nodes are always out of sync

2017-04-01 Thread Chris Lohfink
Repairs do not have an ability to instantly build a perfect view of its data between your 3 nodes at an exact time. When a piece of data is written there is a delay between when they applied between the nodes, even if its just 500ms. So if a request to read the data and build the merkle tree of

Re: Understanding of cassandra metrics

2017-07-07 Thread Chris Lohfink
The coordinator read/scan (Scan is just different naming for the Range, so coordinator view of RangeLatency) is the latencies from the coordinator perspective, so it includes network latency between replicas and such. This which is actually added for speculative retry (why there is no

Re: reduced num_token = improved performance ??

2017-07-12 Thread Chris Lohfink
Probably worth mentioning that some operational procedures like repairs, bootstrapping etc are helped massively by using less tokens. Incremental repairs are one of the things I would say is most impacted the by it since less tokens will mean less local ranges to iterate through and less anti

Re: Nodetool tablehistograms

2017-07-19 Thread Chris Lohfink
Its the number of sstables that may of been read from. This includes sstables who had their bloom filters checked (which may hit disk). This changes a bit in https://issues.apache.org/jira/browse/CASSANDRA-13120 to be only the sstables that its actually reading from. On Wed, Jul 19, 2017 at

Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Theres a read barrier to stop reclaiming a memtable when there are requests actively reading it. The *MemtableReclaimMemory* pool offloads that wait instead of blocking the caller. It in itself is not going to use any cpu or increase load. It will however block the releasing of the memtable

Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Question though, how many tables do you have? If you have more than a few hundreds it could be bottlenecking the flushing if it is flushing very frequently. On Mon, May 1, 2017 at 9:32 PM, Chris Lohfink <clohfin...@gmail.com> wrote: > Theres a read barrier to stop reclaiming a memt

Re: Increasing VNodes

2017-10-04 Thread Chris Lohfink
Increasing number of tokens will make repairs worse not better. You can just split the sub ranges into smaller chunks, you dont need to use vnodes to do that. Simple approach is to iterate through each host token range and split by N and repair them (ie

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Chris Lohfink
Can you share your schema and cfstats? This sounds kinda like a wide partition, backed up compactions, or tombstone issue for it to create so much and have issues like that so quickly with those settings. A heap dump would be most telling but they are rather large and hard to share. Chris On

Re: Do not use Cassandra 3.11.0+ or Cassandra 3.0.12+

2017-09-12 Thread Chris Lohfink
Last Ive seen of it OpsCenter does not collect this metric. I don't think any monitoring tools do. Chris > On Sep 11, 2017, at 4:06 PM, CPC wrote: > > Hi, > > Is this bug fixed in dse 5.1.3? As I understand calling jmx getTombStoneRatio > trigers that bug. We are using

Re: Read-/ Write Latency - Cassandra 2.1 .15 vs 3.10

2017-10-03 Thread Chris Lohfink
RecentReadLatency metrics has been deprecated for years (1.1 or 1.2) and were removed in 2.2. It was a very misleading metric. Instead pull from the Table's ReadLatency metrics from the org.apache.cassandra.metrics domain.

Re: Cassandra - Nodes can't restart due to java.lang.OutOfMemoryError: Direct buffer memory

2017-08-31 Thread Chris Lohfink
What version of java are you running? There is a "kinda leak" in jvm around this you may run into, can try with -Djdk.nio.maxCachedBufferSize=262144 if above 8u102. You can also try increasing the size allowed for direct byte buffers. It defaults to size of heap -XX:MaxDirectMemorySize=?G Some

Re: Cassandra CF Level Metrics (Read, Write Count and Latency)

2017-09-01 Thread Chris Lohfink
To be future compatible should consider using `type=Table` instead of `type=ColumnFamily` depending on your version. > not matching with the total read requests the table level metrics for Read/Write latencies will not match the number of requests you've made. This metric is the amount of time

Re: [EXTERNAL] Re: Increasing VNodes

2017-10-04 Thread Chris Lohfink
ith the docs) is probably more helpful to learn about how > reaper works: http://cassandra-reaper.io/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra-2Dreaper.io_=DwMFAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE=nHN7toaSQUjfwSABx1KXlVHLYmla

Re: Inter Data Center Latency calculation of a Multi DC cluster running in AWS

2017-10-17 Thread Chris Lohfink
An alternative if using >3.8 you can use the org.apache.cassandra.metrics:type=Messaging,name=[DC]-Latency mbean where [DC] is the name of the DC and you can get the inter DC latency per node (to that node). This does not account for NTP drift though, just how long it takes messages (ie mutations)

Re: gc causes C* node hang

2017-11-30 Thread Chris Lohfink
Mail client may be changing changing the char if your copy and pasting, its - "hyphen" not the unicode en dash –. I would recommend adding it to jvm options like oleksandr pointed out Chris On Thu, Nov 30, 2017 at 1:50 AM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Thu, Nov

Re: What is OneMinuteRate in Write Latency?

2017-11-03 Thread Chris Lohfink
Its from the metrics library Meter object which tracks the exponentially weighted moving average of

Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-10-31 Thread Chris Lohfink
CompactionMetrics is a combination of the compaction executor (sstable compactions, secondary index build, view building, relocate, garbagecollect, cleanup, scrub etc) and validation executor (repairs). Keep in mind not all jobs execute 1 task per operation, things that use the

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
The gc log file is best to share when asking for help with tuning. The top of file has all the computed args it ran with and it gives details on what part of the GC is taking time. I would guess the CPU spike is from full GCs which with that small heap of a heap is probably from evacuation

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
13, 2018, at 9:51 AM, rajpal reddy wrote: > > jvm_gc_collection_seconds_count{gc="G1 Young Generation”} and also young > generation seconds count keep increasing > > > >> On Jun 13, 2018, at 9:52 AM, Chris Lohfink > <mailto:clohf...@apple.com>> wrote:

  1   2   >