Re: Switching to Incremental Repair

2024-02-15 Thread Chris Lohfink
I would recommend adding something to C* to be able to flip the repaired state on all sstables quickly (with default OSS can turn nodes off one at a time and use sstablerepairedset). It's a life saver to be able to revert back to non-IR if migration going south. Same can be used to quickly switch

Re: Nodetool command to pre-load the chunk cache

2023-03-24 Thread Chris Lohfink
Something additional to consider (outside C* fix) is using a tool like happycache to have consistent pagecache between them. Might be sufficient if the data is in memory already. Chris On Tue, Mar 21, 2023 at 2:48 PM Jeff Jirsa wrote: > We

Re: oversized partition detection ? monitoring the partitions growth ?

2019-11-01 Thread Chris Lohfink
You can set compaction_large_partition_warning_threshold_mb and alert on logs . Writing large partition {}/{}:{} ({}) to sstable {}

Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-19 Thread Chris Lohfink
"It depends" on your version and heap size but G1 is easier to get right so probably wanna stick with that unless you are using small heaps or really interested in tuning it (likely for massively smaller gains then tuning your data model). There is no GC algo that is strictly better than others in

Re: loosing data during saving data from java

2019-10-19 Thread Chris Lohfink
If the writes are being coming fast enough that the commitlog cant keep up it will block applying mutations the the memtable (even with periodic once hit >1.5x flush time). Things will queue up and possibly timeout but they will not be acknowledged until applied. If you do it enough fast enough

Re: Collecting Latency Metrics

2019-05-30 Thread Chris Lohfink
For what it is worth, generally I would recommend just using the mean vs calculating it yourself. It's a lot easier and averages are meaningless for anything besides trending anyway (which is really what this is useful for, finding issues on the larger scale), especially with high volume clusters

Re: Collecting Latency Metrics

2019-05-30 Thread Chris Lohfink
> > org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the > latency in milliseconds > Its actually in microseconds, unless calling the values() operation which gives the histogram in nanoseconds On Wed, May 29, 2019 at 4:34 PM Paul Chandler wrote: > There are various

Re: Collecting Latency Metrics

2019-05-29 Thread Chris Lohfink
To answer your question org.apache.cassandra.metrics:type=Table,name=ReadTotalLatency can give you the total local read latency in microseconds and you can get the count from the Latency read metric. If you are going to do that be sure to do it on the delta from previous query (new - last) for

Re: Cassandra config in table

2019-02-25 Thread Chris Lohfink
In 4.0+ you can SELECT * FROM system_views.settings; Chris On Mon, Feb 25, 2019 at 9:22 AM Abdul Patel wrote: > Do we have any sustem table which stores all config details which we have > in yaml or cassandra env.sh?

Re: Cassandra collection tombstones

2019-01-25 Thread Chris Lohfink
> The "estimated droppable tombstone" value is actually always wrong. Because > it's an estimate that does not consider overlaps (and I'm not sure about the > fact it considers the gc_grace_seconds either). It considers the time the tombstone was created and the gc_grace_seconds, it doesn't

Re: Compact storage removal effect

2019-01-22 Thread Chris Lohfink
gt; > CREATE TABLE ks.cf2 ( > key bigint, > column1 text, > value blob, > PRIMARY KEY (key, column1) > ) WITH COMPACT STORAGE > > CREATE TABLE ks.cf3 ( > key text, > column1 timestamp, > value int, > PRIMARY KEY (key, column1) > )

Re: Compact storage removal effect

2019-01-22 Thread Chris Lohfink
What version are you running? Did you include an upgradesstables -a or something to rebuild without the compact storage in your migration? After 3.0 the new format can be more or less the same size as the 2.x compact storage tables depending on schema (which can impact things a lot). Chris >

Re: High CPU usage on some of the nodes due to message coalesce

2018-10-20 Thread Chris Lohfink
1s young gcs are horrible and likely cause of some of your bad metrics. How large are your mutations/query results and what gc/heap settings are you using? You can use https://github.com/aragozin/jvm-tools to see the threads generating allocation

Re: jmxterm "#NullPointerException: No such PID "

2018-09-20 Thread Chris Lohfink
For what its worth, I highly recommend you remove that option in all cassandra clusters first thing. A possibly non existent improvement (ie /tmp on different low throughput drive) vs being able to diagnose issues is a no brainer. You can measure or monitor gc logs for your safepoint pauses to see

Re: Setting up rerouting java/python driver read requests from unresponsive nodes to good ones

2018-08-15 Thread Chris Lohfink
That’s what the retry handler does (see Horia’s response). You can also use the speculative retry to possibly send requests to multiple coordinators a little earlier as well to reduce the impact of the slow requests (ie a GC).

Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2018-08-10 Thread Chris Lohfink
If its occurring that often you can monitor nodetool compactionstats to see whats running > On Aug 10, 2018, at 11:35 AM, Dionne Cloudoupoulos > wrote: > > On 2017/10/31 16:56:29, Chris Lohfink wrote: >> The "CompletedTasks" metric is a measure of how many tasks r

Re: concurrent_compactors via JMX

2018-07-18 Thread Chris Lohfink
Refer to Alains email but to strictly answer the question of increasing concurrent_compactors via jmx: There are two attributes you can increase that would set the maximum number of concurrent compactions. org.apache.cassandra.db:type=CompactionManager,name=MaximumCompactorThreads -> 6

Re: Compaction process stuck

2018-07-05 Thread Chris Lohfink
gt; Hi Chris, > Thanks for reply. > > Unfortunately, our servers do not have jstack installed. > I tried "kill -3 " option but that is also not generating thread dump. > > Is there any other way I can generate thread dump? > > Thanks & Regards, > At

Re: Compaction process stuck

2018-07-04 Thread Chris Lohfink
Can you take a thread dump (jstack) and share the state of the compaction threads? Also check for “Exception” in logs Chris Sent from my iPhone > On Jul 4, 2018, at 8:37 AM, atul atri wrote: > > Hi, > > On one of our server, compaction process is hanging. It's stuck at 80%. It > was stuck

Re: G1GC CPU Spike

2018-06-15 Thread Chris Lohfink
at the same time and also attached the gc.log. grafana > dashboard and gc.log timing are 4hours apart gc can be see 06/12th around > 22:50 > > rate(jvm_gc_collection_seconds_sum{"}[5m]) > > > On Jun 13, 2018, at 5:26 PM, Chris Lohfink > <mailto:clohf...@apple.com&g

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
There are not even a 100ms GC pause in that, are you certain theres a problem? > On Jun 13, 2018, at 3:00 PM, rajpal reddy wrote: > > Thanks Chris I did attached the gc logs already. reattaching them > now. > > it started yesterday around 11:54PM >> On Jun 13, 2018, a

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
What is the criteria for picking up the value for G1ReservePercent? > > Subroto > >> On Jun 13, 2018, at 6:52 AM, Chris Lohfink wrote: >> >> G1ReservePercent > > - > To unsubscribe, e-mail: user-unsu

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
13, 2018, at 9:51 AM, rajpal reddy wrote: > > jvm_gc_collection_seconds_count{gc="G1 Young Generation”} and also young > generation seconds count keep increasing > > > >> On Jun 13, 2018, at 9:52 AM, Chris Lohfink > <mailto:clohf...@apple.com>> wrote:

Re: G1GC CPU Spike

2018-06-13 Thread Chris Lohfink
The gc log file is best to share when asking for help with tuning. The top of file has all the computed args it ran with and it gives details on what part of the GC is taking time. I would guess the CPU spike is from full GCs which with that small heap of a heap is probably from evacuation

Re: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Chris Lohfink
Might be better to disable explicit gcs so the full gcs don’t even occur. It’s likely from the rmi dgc or directbytebuffers not any actual need to do gcs or the concurrent gc threads would be an issue as well. Nodetool also has no excuse to use that big of a heap so it should have max size

Re: tablestats and gossip

2018-04-06 Thread Chris Lohfink
Yes, its the count of all locally applied writes to that table. A insert to a table with a RF=3 should increase the local write count by 1 on 3 different nodes. Chris > On Apr 6, 2018, at 5:00 AM, Grzegorz Pietrusza wrote: > > Hi all > > Does local write count provided

Re: Understanding Blocked and All Time Blocked columns in tpstats

2018-03-23 Thread Chris Lohfink
sing > the queue size just exacerbate the problem? > > On Fri, Mar 23, 2018 at 11:51 AM, Chris Lohfink <clohf...@apple.com > <mailto:clohf...@apple.com>> wrote: > It blocks the caller attempting to add the task until theres room in queue, > applying back pressure.

Re: Understanding Blocked and All Time Blocked columns in tpstats

2018-03-23 Thread Chris Lohfink
It blocks the caller attempting to add the task until theres room in queue, applying back pressure. It does not reject it. It mimics the behavior from pre-SEP DebuggableThreadPoolExecutor's RejectionExecutionHandler that the other thread pools use (exception on sampling/trace which just throw

Re: Delete System_Traces Table

2018-03-19 Thread Chris Lohfink
it, > no? > > Also, if I intend to upgrade to version 3.11.2, will the existence of the > table cause any issues? > > Thanks! > > On Mon, Mar 19, 2018 at 4:30 PM, Chris Lohfink <clohf...@apple.com > <mailto:clohf...@apple.com>> wrote: > Oh I misread o

Re: Delete System_Traces Table

2018-03-19 Thread Chris Lohfink
ailto:rahul.xavier.si...@gmail.com>> wrote: > I think he just wants to delete the test table not the whole keyspace. Is > that correct? > > -- > Rahul Singh > rahul.si...@anant.us <mailto:rahul.si...@anant.us> > > Anant Corporation > > On Mar 19, 2018, 9:0

Re: Delete System_Traces Table

2018-03-19 Thread Chris Lohfink
No. Why do you want to? If you don't use tracing they will be empty, and if were able to drop them you will no longer be able to use tracing in debugging. Chris > On Mar 19, 2018, at 7:52 AM, shalom sagges wrote: > > Hi All, > > I accidentally created a test table

Re: WARN [PERIODIC-COMMIT-LOG-SYNCER] .. exceeded the configured commit interval by an average of...

2018-03-16 Thread Chris Lohfink
If you just want to make it work, increase commitlog_segment_size_in_mb to 64. A single mutation cannot exceed 1/2 the segment size. If you want to actually fix your problem decrease the size of the mutations and limit the size of the value blob. <== recommended Chris > On Mar 16, 2018, at

Re: system.size_estimates - safe to remove sstables?

2018-03-06 Thread Chris Lohfink
p or spark or any such tool. > > So, do you think we can just remove the cf and restart the service? > > Thanks, > Kunal > > On 5 March 2018 at 21:52, Chris Lohfink <clohf...@apple.com > <mailto:clohf...@apple.com>> wrote: > Any chance space used by snapshots? W

Re: cfhistograms InstanceNotFoundException EstimatePartitionSizeHistogram

2018-03-06 Thread Chris Lohfink
Make sure your using same version of nodetool as your version of Cassandra. That metric was renamed from EstimatedRowSize so if using a version of nodetool made for a more recent version you would get this error since EstimatePartitionSizeHistogram doesn’t exist on the older Cassandra host.

Re: system.size_estimates - safe to remove sstables?

2018-03-05 Thread Chris Lohfink
Any chance space used by snapshots? What files exist there that are taking up space? > On Mar 5, 2018, at 1:02 AM, Kunal Gangakhedkar > wrote: > > Hi all, > > I have a 2-node cluster running cassandra 2.1.18. > One of the nodes has run out of disk space and died -

Re: system.size_estimates - safe to remove sstables?

2018-03-05 Thread Chris Lohfink
Unless using spark or hadoop nothing consumes the data in that table (unless you have tooling that may use it like opscenter or something) so your safe to just truncate it or rm the sstables when instance offline you will be fine, if you do use that table you can then do a `nodetool

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Chris Lohfink
Instead of saying "Make X better" you can quantify "Here's how we can make X better" in a jira and the conversation will continue with interested parties (opening jiras are free!). Being combative and insulting project on mailing list may help vent some frustrations but it is counter productive

Re: Commitlogs are filling the Full Disk space and nodes are down

2018-01-30 Thread Chris Lohfink
The commitlog growing is often a symptom of a problem. If the memtable flush or post flush fails in anyway, the commitlogs will not be recycled/deleted and will continue to pool up. Might want to go back in logs earlier to make sure theres nothing like the postmemtable flusher getting a

Re: sstabledump tries to delete a file

2018-01-10 Thread Chris Lohfink
Yes it should be read only, open a jira please. It does look like if the fp changed it would rebuild or if your missing. When it builds the table metadata from the sstable it can just set the properties to match that of the sstable to prevent this. Chris On Wed, Jan 10, 2018 at 4:16 AM,

Re: sstable

2017-12-20 Thread Chris Lohfink
Somewhere along the line sstabledump tool incorrectly got setup to use tool initialization, its fixed https://issues.apache.org/jira/browse/CASSANDRA-13683 Chris On Tue, Dec 19, 2017 at 5:45 PM, Mounika kale wrote: > Hi, > I'm getting below error for all sstable

Re: gc causes C* node hang

2017-11-30 Thread Chris Lohfink
Mail client may be changing changing the char if your copy and pasting, its - "hyphen" not the unicode en dash –. I would recommend adding it to jvm options like oleksandr pointed out Chris On Thu, Nov 30, 2017 at 1:50 AM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Thu, Nov

Re: What is OneMinuteRate in Write Latency?

2017-11-03 Thread Chris Lohfink
Its from the metrics library Meter object which tracks the exponentially weighted moving average of

Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-10-31 Thread Chris Lohfink
CompactionMetrics is a combination of the compaction executor (sstable compactions, secondary index build, view building, relocate, garbagecollect, cleanup, scrub etc) and validation executor (repairs). Keep in mind not all jobs execute 1 task per operation, things that use the

Re: Inter Data Center Latency calculation of a Multi DC cluster running in AWS

2017-10-17 Thread Chris Lohfink
An alternative if using >3.8 you can use the org.apache.cassandra.metrics:type=Messaging,name=[DC]-Latency mbean where [DC] is the name of the DC and you can get the inter DC latency per node (to that node). This does not account for NTP drift though, just how long it takes messages (ie mutations)

Re: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Chris Lohfink
Can you share your schema and cfstats? This sounds kinda like a wide partition, backed up compactions, or tombstone issue for it to create so much and have issues like that so quickly with those settings. A heap dump would be most telling but they are rather large and hard to share. Chris On

Re: [EXTERNAL] Re: Increasing VNodes

2017-10-04 Thread Chris Lohfink
ith the docs) is probably more helpful to learn about how > reaper works: http://cassandra-reaper.io/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra-2Dreaper.io_=DwMFAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE=nHN7toaSQUjfwSABx1KXlVHLYmla

Re: Increasing VNodes

2017-10-04 Thread Chris Lohfink
Increasing number of tokens will make repairs worse not better. You can just split the sub ranges into smaller chunks, you dont need to use vnodes to do that. Simple approach is to iterate through each host token range and split by N and repair them (ie

Re: Read-/ Write Latency - Cassandra 2.1 .15 vs 3.10

2017-10-03 Thread Chris Lohfink
RecentReadLatency metrics has been deprecated for years (1.1 or 1.2) and were removed in 2.2. It was a very misleading metric. Instead pull from the Table's ReadLatency metrics from the org.apache.cassandra.metrics domain.

Re: Do not use Cassandra 3.11.0+ or Cassandra 3.0.12+

2017-09-12 Thread Chris Lohfink
Last Ive seen of it OpsCenter does not collect this metric. I don't think any monitoring tools do. Chris > On Sep 11, 2017, at 4:06 PM, CPC wrote: > > Hi, > > Is this bug fixed in dse 5.1.3? As I understand calling jmx getTombStoneRatio > trigers that bug. We are using

Re: Cassandra CF Level Metrics (Read, Write Count and Latency)

2017-09-01 Thread Chris Lohfink
To be future compatible should consider using `type=Table` instead of `type=ColumnFamily` depending on your version. > not matching with the total read requests the table level metrics for Read/Write latencies will not match the number of requests you've made. This metric is the amount of time

Re: Cassandra - Nodes can't restart due to java.lang.OutOfMemoryError: Direct buffer memory

2017-08-31 Thread Chris Lohfink
What version of java are you running? There is a "kinda leak" in jvm around this you may run into, can try with -Djdk.nio.maxCachedBufferSize=262144 if above 8u102. You can also try increasing the size allowed for direct byte buffers. It defaults to size of heap -XX:MaxDirectMemorySize=?G Some

Re: Nodetool tablehistograms

2017-07-19 Thread Chris Lohfink
Its the number of sstables that may of been read from. This includes sstables who had their bloom filters checked (which may hit disk). This changes a bit in https://issues.apache.org/jira/browse/CASSANDRA-13120 to be only the sstables that its actually reading from. On Wed, Jul 19, 2017 at

Re: reduced num_token = improved performance ??

2017-07-12 Thread Chris Lohfink
Probably worth mentioning that some operational procedures like repairs, bootstrapping etc are helped massively by using less tokens. Incremental repairs are one of the things I would say is most impacted the by it since less tokens will mean less local ranges to iterate through and less anti

Re: Understanding of cassandra metrics

2017-07-07 Thread Chris Lohfink
The coordinator read/scan (Scan is just different naming for the Range, so coordinator view of RangeLatency) is the latencies from the coordinator perspective, so it includes network latency between replicas and such. This which is actually added for speculative retry (why there is no

Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Question though, how many tables do you have? If you have more than a few hundreds it could be bottlenecking the flushing if it is flushing very frequently. On Mon, May 1, 2017 at 9:32 PM, Chris Lohfink <clohfin...@gmail.com> wrote: > Theres a read barrier to stop reclaiming a memt

Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Theres a read barrier to stop reclaiming a memtable when there are requests actively reading it. The *MemtableReclaimMemory* pool offloads that wait instead of blocking the caller. It in itself is not going to use any cpu or increase load. It will however block the releasing of the memtable

Re: system_auth replication strategy

2017-04-01 Thread Chris Lohfink
You should use a network topology strategy with high RF in each DC or something like the everywhere strategy. You should never really use SimpleStrategy, especially if you have multiple DCs and are using LOCAL or EACH consistencies. Its more for test and dev setups then a prod environment.

Re: nodes are always out of sync

2017-04-01 Thread Chris Lohfink
Repairs do not have an ability to instantly build a perfect view of its data between your 3 nodes at an exact time. When a piece of data is written there is a delay between when they applied between the nodes, even if its just 500ms. So if a request to read the data and build the merkle tree of

Re: partition sizes reported by nodetool tablehistograms

2017-02-24 Thread Chris Lohfink
Its the decompressed size of the partitions. Each sstable has stats component that contains histograms for the size and number of columns in the partitions (among other things, can see with sstablemetadata tool), tablehistograms merges it for each sstable and gives the results. Chris On Fri, Feb

Re: Help

2017-01-09 Thread Chris Lohfink
Do you have any monitoring setup around garbage collections? A GC + network latency > write timeout will cause intermittent hints. On Sun, Jan 8, 2017 at 10:30 PM, Anshu Vajpayee wrote: > Gossip shows - all nodes are up. > > But when we perform writes , coordinator

Re: Java GC pauses, reality check

2016-11-25 Thread Chris Lohfink
No tuning will eliminate gcs. 20-30 seconds is horrific and out of the ordinary. Most likely implementing antipatterns and/or poorly configured. Sub 1s is realistic but with some workloads still may require some tuning to maintain. Some workloads are very unfriendly to GCs though (ie heavy

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Chris Lohfink
count(*) actually pages through all the data. So a select count(*) without a limit would be expected to cause a lot of load on the system. The hit is more than just IO load and CPU, it also creates a lot of garbage that can cause pauses slowing down the entire JVM. Some details here:

Re: metrics not resetting after running proxyhistograms or cfhistograms

2016-10-25 Thread Chris Lohfink
That behavior went away with 2.2. https://issues.apache.org/jira/browse/CASSANDRA-11752 adds decay to it to make it recent data which is much better then just reseting on reads. Chris On Tue, Oct 25, 2016 at 2:06 PM, Andrew Bialecki < andrew.biale...@klaviyo.com> wrote: > We're running 3.6.

Re: system_distributed.repair_history table

2016-10-06 Thread Chris Lohfink
It makes sense to periodically truncate as it is > only for debugging purposes > > Naidu Saladi > > > On Wednesday, October 5, 2016 8:03 PM, Chris Lohfink <clohfin...@gmail.com> > wrote: > > > The only current solution is to truncate it periodically. I o

Re: system_distributed.repair_history table

2016-10-05 Thread Chris Lohfink
The only current solution is to truncate it periodically. I opened https://issues.apache.org/jira/browse/CASSANDRA-12701 about it if interested in following On Wed, Oct 5, 2016 at 4:23 PM, Saladi Naidu wrote: > We are seeing following warnings in system.log, As >

Re: repair_history maintenance

2016-09-23 Thread Chris Lohfink
Probably should just periodically truncate/clear snapshots when gets too big (will probably take months before noticeable). I opened https://issues.apache.org/jira/browse/CASSANDRA-12701 for discussion on if it should use TTLs Chris On Thu, Sep 22, 2016 at 1:28 PM, sfesc...@gmail.com

Re: How to get information of each read/write request?

2016-08-30 Thread Chris Lohfink
Running a query with trace (`TRACING ON` in cqlsh) can give you a lot of the information for an individual request. There has been a ticket to track time in queue (https://issues.apache.org/jira/browse/CASSANDRA-8398) but no ones worked on it yet. Chris On Tue, Aug 30, 2016 at 12:20 PM, Jun Wu

Re: Hintedhandoff mutation

2016-08-17 Thread Chris Lohfink
Probably question better suited for the dev@ list. But I afaik the answer is there is no way to tell the difference, but probably safe to look at the created time, HHs tend to be older. Chris On Wed, Aug 17, 2016 at 5:02 AM, Stone Fang wrote: > Hi All, > > I want to

Re: a solution of getting cassandra cross-datacenter latency at a certain time

2016-08-08 Thread Chris Lohfink
bins range during the period. Also can wait for CASSANDRA-11752 <https://issues.apache.org/jira/browse/CASSANDRA-11752> for the a "recent" histogram (although would need to apply it to this histogram as well). Chris Lohfink On Mon, Aug 8, 2016 at 8:50 AM, Ryan Svihla <r...

Re: Approximate row count

2016-07-27 Thread Chris Lohfink
the number of keys are the number of *partition keys, *not row keys. You have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you have some wide partitions that contain many of your rows. Chris Lohfink On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <l...@getadmiral.com> wrote

Re: sstabledump failing for system keyspace tables

2016-06-11 Thread Chris Lohfink
related to https://issues.apache.org/jira/browse/CASSANDRA-11330, most of the system tables will work but batches are kinda special cased and uses the localpartitioner (see:

Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

2016-04-11 Thread Chris Lohfink
Where do you get the ~1ms latency between AZs? Comparing a short term average to a 99th percentile isn't very fair. "Over the last month, the median is 2.09 ms, 90th percentile is 20ms, 99th percentile is 47ms." - per

Re: CRT

2016-02-23 Thread Chris Lohfink
Check out http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen. You can run it yourself to test as well. Chris On Tue, Feb 23, 2016 at 7:02 PM, Rakesh Kumar wrote: > https://www.aphyr.com/posts/294-jepsen-cassandra > > How much of this is still valid in ver

Re: opscenter doesn't work with cassandra 3.0

2016-01-26 Thread Chris Lohfink
DataStax has a free program for startups http://www.datastax.com/datastax-enterprise-for-startups On Tue, Jan 26, 2016 at 9:42 AM, Otis Gospodnetić < otis.gospodne...@gmail.com> wrote: > Hi Duyhai, > > SPM is not free, but there is a free plan, plus we have special pricing > for startups,

Re: Estimated key count from nodetool tablestats

2016-01-24 Thread Chris Lohfink
index and could be off by a lot in wide rows/updated/many sstable use cases. --- Chris Lohfink On Sun, Jan 24, 2016 at 6:32 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Does the nodetool tablestats output line for "Number of keys (estimate)" > indicate partition

Re: Infinite loop in SliceQueryFilter

2015-12-04 Thread Chris Lohfink
May just be going over a lot of data. Does output of 'nodetool cfstats' show large partitions? (partition maximum bytes). "collecting 1 of 2147483647" is suspicious. Are your queries using ALLOW FILTERING or have very high limits? If trying to read 2 billion entries in 1 query you will have memory

Re: Error Code

2015-10-29 Thread Chris Lohfink
It means a response (opcode 8) message couldn't be decoded. What driver are you using? What version? What version of C*? Chris On Thu, Oct 29, 2015 at 9:19 AM, Eduardo Alfaia wrote: > yes, but what does it mean? > > On 29 Oct 2015, at 15:18, Kai Wang

Re: confusion about nodetool cfstats

2015-09-10 Thread Chris Lohfink
DSE you can use the performance service to get some of the metrics (including aggregates across dc, keyspace, cluster etc) from CQL. Chris Lohfink On Thu, Sep 10, 2015 at 9:38 PM, Shuo Chen <chenatu2...@gmail.com> wrote: > Sorry to send the previous message. > > I want to monit

Re: Last two metrics of cfstats

2015-09-02 Thread Chris Lohfink
Its number of cells and tombstones seen on the partitions during reads. Just ignore the "last five minutes" part though since thats incorrect. It being zero probably means theres been no actual reads have been off of disk on that node. Might want to check if "Local read count" is non-zero which

Re: cfstats ERROR

2015-06-20 Thread Chris Lohfink
Issue here: https://issues.apache.org/jira/browse/CASSANDRA-9580 Fixed in 2.1.7. Chris On Sat, Jun 20, 2015 at 1:40 PM, 曹志富 cao.zh...@gmail.com wrote: error: /home/ant/apache-cassandra-2.1.6/bin/../data/data/blogger/edgestore/blogger-edgestore-tmplink-ka-146100-Data.db -- StackTrace --

Re: Really high read latency

2015-03-23 Thread Chris Lohfink
Compacted partition maximum bytes: 36904729268 thats huge... 36gb rows are gonna cause a lot of problems, even when you specify a precise cell under this it still is going to have an enormous column index to deserialize on every read for the partition. As mentioned above, you should include

Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
Your cluster is probably having issues with compactions (with STCS you should never have this many). I would probably punt with OpsCenter/rollups60. Turn the node off and move all of the sstables off to a different directory for backup (or just rm if you really don't care about 1 minute metrics),

Re: nodetool status shows large numbers of up nodes are down

2015-02-10 Thread Chris Lohfink
Are you hitting long GCs on your nodes? Can check gc log or look at cassandra log for GCInspector. Chris On Tue, Feb 10, 2015 at 1:28 PM, Cheng Ren cheng@bloomreach.com wrote: Hi Carlos, Thanks for your suggestion. We did check the NTP setting and clock, and they are all working

Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
nodetool compact. If that goes successfully, then would it be safe to chalk the lack of compaction on this table in the past up to 2.1.2 problems? ~ Paul Nickerson On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com wrote: Your cluster is probably having issues

Re: High GC activity on node with 4TB on data

2015-02-09 Thread Chris Lohfink
- number of tombstones - how can I reliably find it out? https://github.com/spotify/cassandra-opstools https://github.com/cloudian/support-tools If not getting much compression it may be worth trying to disable it, it may contribute but its very unlikely that its the cause of the gc pressure

Re: How to remove obsolete error message in Datastax Opscenter?

2015-02-09 Thread Chris Lohfink
Restarting opscenter service will get rid of it. Chris On Mon, Feb 9, 2015 at 3:01 AM, Björn Hachmann bjoern.hachm...@metrigo.de wrote: Good morning, unfortunately my last rolling restart of our Cassandra cluster issued from OpsCenter (5.0.2) failed. No big deal, but since then OpsCenter is

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
What about 15 gb? not ok :) don't let a single partition get to 1gb, 100's of mb should be when flares are going up. The main reasoning is compactions would be horrifically slow and there will be a lot of gc pain. Bringing the time bucket to be by day will probably be sufficient. It would take

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Chris Lohfink
instead of own stuff if you want to test cassandra and not your code. === Chris Lohfink On Mon, Dec 8, 2014 at 4:57 AM, 孔嘉林 kongjiali...@gmail.com wrote: Thanks Chris. I run a *client on a separate* AWS *instance from* the Cassandra cluster servers. At the client side, I create 40 or 50

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Chris Lohfink
statements, running cql over thrift is far from optimal. I would recommend using the cassandra-stress tool if you want to stress test Cassandra (and not your code) http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema === Chris Lohfink On Sun, Dec 7, 2014 at 9:48 PM, 孔

Re: Programmatic Cassandra version detection/extraction

2014-11-13 Thread Chris Lohfink
There is a ReleaseVersion attribute in the org.apache.cassandra.db:StorageService bean --- Chris Lohfink On Wed, Nov 12, 2014 at 5:57 PM, Michael Shuler mich...@pbandjelly.org wrote: On 11/12/2014 04:58 PM, Michael Shuler wrote: On 11/12/2014 04:44 PM, Otis Gospodnetic wrote

Re: What actually causing java.lang.OutOfMemoryError: unable to create new native thread

2014-11-10 Thread Chris Lohfink
if your using 64 bit, check output of: cat /proc/{cassandra pid}/limits some older linux kernels wont work with above so if it doesnt exist check the ulimit -a output for the cassandra user. max processes per user may be your issue as well. --- Chris Lohfink On Mon, Nov 10, 2014 at 11:21 AM

Re: query tracing

2014-11-07 Thread Chris Lohfink
It saves a lot of information for each request thats traced so there is significant overhead. If you start at a low probability and move it up based on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin

Re: Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Chris Lohfink
(if have network for it) and compaction throughput if you end up with IO to spare. I generally would not recommend putting multiple C* instances on a single box. --- Chris Lohfink On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote: I’m curious what people are doing

Re: tuning concurrent_reads param

2014-10-29 Thread Chris Lohfink
thread pool (nodetool tpstats) you can see if they are actually all busy or not. If its near 32 (or whatever you set it at) all the time it may be a bottleneck. --- Chris Lohfink On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Hi, looking at the docs, the default value

Re: Exploring Simply Queueing

2014-10-05 Thread Chris Lohfink
consumers from reading same message off of a queue? You mention in docs you will address it at a later point in time but its kinda a biggy. Big lock batch reads like astyanax recipe? --- Chris Lohfink On Oct 5, 2014, at 6:03 PM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, I have

Re: CPU consumption of Cassandra

2014-09-23 Thread Chris Lohfink
want to get more out of these systems can do some tuning probably, enable trace to see whats actually the bottleneck. Collections will very likely hurt more then help. --- Chris Lohfink On Sep 23, 2014, at 9:39 AM, Leleu Eric eric.le...@worldline.com wrote: I tried to run “cassandra-stress

Re: CPU consumption of Cassandra

2014-09-23 Thread Chris Lohfink
with yourkit) can give more exposure to the bottleneck. Id run test from separate system first. --- Chris Lohfink On Sep 23, 2014, at 12:48 PM, Leleu Eric eric.le...@worldline.com wrote: First of all, Thanks for your help ! :) Here is some details : With RF=N=2 your essentially testing

Re: CPU consumption of Cassandra

2014-09-22 Thread Chris Lohfink
on the select/read is marked as RUNNABLE but its really more of a wait state that may throw some profilers off, it may be a red haring. --- Chris Lohfink On Sep 22, 2014, at 11:39 AM, Leleu Eric eric.le...@worldline.com wrote: Hi, I’m currently testing Cassandra 2.0.9 (and since the last

Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
35 isn't that high really in some scenarios (ie, theres a lot of column families), is it continuing to climb or does it drop down shortly after? --- Chris Lohfink On Sep 22, 2014, at 7:57 PM, arun sirimalla arunsi...@gmail.com wrote: I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE

Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
Whats the output of 'nodetool compactionstats'? Is concurrent_compactors not set in your cassandra.yaml? Any Exception or Error 's in the system.log or output.log? --- Chris Lohfink On Sep 22, 2014, at 9:50 PM, Arun arunsi...@gmail.com wrote: Its constant since 4 hours. Remaining nodes

  1   2   >