Re: ssl certificate hot reloading test - cassandra 4.1

2024-04-15 Thread Jeff Jirsa
It seems like if folks really want the life of a connection to be finite (either client/server or server/server), adding in an option to quietly drain and recycle a connection on some period isn’t that difficult. That type of requirement shows up in a number of environments, usually on

Re: Datacenter decommissioning on Cassandra 4.1.4

2024-04-08 Thread Jeff Jirsa
To Jon’s point, if you remove from replication after step 1 or step 2 (probably step 2 if your goal is to be strictly correct), the nodetool decommission phase becomes almost a no-op. If you use the order below, the last nodes to decommission will cause those surviving machines to run out of

Re: Schema inconsistency in mixed-version cluster

2023-12-12 Thread Jeff Jirsa
A static collection is probably atypical, and again, would encourage you to open a JIRA. This seems like a case we should be able to find in a simulator. On Tue, Dec 12, 2023 at 2:05 PM Sebastian Marsching wrote: > I assume these are column names of a non-system table. > > This is correct. It

Re: Schema inconsistency in mixed-version cluster

2023-12-12 Thread Jeff Jirsa
This deserves a JIRA On Tue, Dec 12, 2023 at 8:30 AM Sebastian Marsching wrote: > Hi, > > while upgrading our production cluster from C* 3.11.14 to 4.1.3, we > experienced the issue that some SELECT queries failed due to supposedly no > replica being available. The system logs on the C* nodes

Re: Remove folders of deleted tables

2023-12-05 Thread Jeff Jirsa
The last time you mentioned this: On Tue, Dec 5, 2023 at 11:57 AM Sébastien Rebecchi wrote: > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Jeff Jirsa
I don’t work for datastax, thats not my blog, and I’m on a phone and potentially missing nuance, but I’d never try to convert a cluster to IR by disabling auto compaction . It sounds very much out of date or its optimized for fixing one node in a cluster somehow. It didn’t make sense in the 4.0

Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Jeff Jirsa
> On Oct 26, 2023, at 12:32 AM, Michalis Kotsiouros (EXT) via user > wrote: > >  > Hello Cassandra community, > We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan > to do this per data center. > During the upgrade, a cluster with mixed SW levels is expected. At

Re: Resources to understand rebalancing

2023-10-25 Thread Jeff Jirsa
Data ownership is defined by the token ring concept. Hosts in the cluster may have tokens - let's oversimplify to 5 hosts, each with 1 token A=0, B=1000, C=2000, D=3000, E=4000 The partition key is hashed to calculate the token, and the next 3 hosts in the ring are the "owners" of that data - a

Re: Cassandra 4.0.6 token mismatch issue in production environment

2023-10-23 Thread Jeff Jirsa
en? > > Jaydeep > > On Sat, Oct 21, 2023 at 10:25 AM Jaydeep Chovatia < > chovatia.jayd...@gmail.com> wrote: > >> Thanks, Jeff! >> I will keep this thread updated on our findings. >> >> Jaydeep >> >> On Sat, Oct 21, 2023 at 9:37 AM Jeff Jirs

Re: Cassandra 4.0.6 token mismatch issue in production environment

2023-10-21 Thread Jeff Jirsa
That code path was added to protect against invalid gossip states For this logger to be issued, the coordinator receiving the query must identify a set of replicas holding the data to serve the read, and one of the selected replicas must disagree that it’s a replica based on its view of the

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Jeff Jirsa
Just to be clear: - How many of the proxy nodes are you providing as contact points? One of them or all of them? It sounds like you're saying you're passing all of them, and only one is connecting, and the driver is declining to connect to the rest because they're not in system.peers. I'm not

Re: Startup errors - 4.1.3

2023-08-30 Thread Jeff Jirsa
There are at least two bugs in the compaction lifecycle transaction log - one that can drop an ABORT / ADD in the wrong order (and prevent startup), and one that allows for invalid timestamps in the log file (and again, prevent startups). I believe it's safe to work around the former by removing

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
ing aka faster streaming also works for > STCS. > > Dinesh > > On Aug 21, 2023, at 8:01 AM, Jeff Jirsa wrote: > >  > There's a lot of questionable advice scattered in this thread. Set aside > most of the guidance like 2TB/node, it's old and super nuanced. > >

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced. If you're bare metal, do what your organization is good at. If you have millions of dollars in SAN equipment and you know how SANs work and fail and get backed

Re: Big Data Question

2023-08-16 Thread Jeff Jirsa
A lot of things depend on actual cluster config - compaction settings (LCS vs STCS vs TWCS) and token allocation (single token, vnodes, etc) matter a ton. With 4.0 and LCS, streaming for replacement is MUCH faster, so much so that most people should be fine with 4-8TB/node, because the rebuild

Re: Cassandra p95 latencies

2023-08-11 Thread Jeff Jirsa
You’re going to have to help us help you 4.0 is pretty widely deployed. I’m not aware of a perf regression Can you give us a schema (anonymized) and queries and show us a trace ? On Aug 10, 2023, at 10:18 PM, Shaurya Gupta wrote:The queries are rightly designed as I already explained. 40 ms is

Re: write on ONE node vs replication factor

2023-07-15 Thread Jeff Jirsa
Consistency level controls when queries acknowledge/succeed Replication factor is where data lives / how many copies If you write at consistency ONE and replication factor 3, the query finishes successfully when the write is durable on one of the 3 copies. It will get sent to all 3, but it’ll

Re: Replacing node without shutting down the old node

2023-05-16 Thread Jeff Jirsa
On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa <jji...@gmail.com> wrote:You can't have two nodes with the same token (in the current metadata implementation) - it causes problems counting things like how many replicas ACK a write, and what happens if the one you're replacing ACKs a write but the

Re: Replacing node without shutting down the old node

2023-05-08 Thread Jeff Jirsa
You can't have two nodes with the same token (in the current metadata implementation) - it causes problems counting things like how many replicas ACK a write, and what happens if the one you're replacing ACKs a write but the joining host doesn't? It's harder than it seems to maintain consistency

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jeff Jirsa
(node *addition* >>> followed by node *decommission*, which of course changes the topology), >>> and we have a cluster of size 100 nodes with 300GB per node. If we have to >>> run cleanup on 100 nodes after every replacement, then it could take >>> forever.

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
Jeff Jirsa <jji...@gmail.com> wrote:Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators to run when they’re sure the state of

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Jeff Jirsa
Are you always inserting into the same partition (with contention) or different ? Which version are you using ? The short tldr is that the failure modes of the existing paxos implementation (under contention, under latency, under cluster strain) can cause undefined states. I believe that a

Re: When are sstables that were compacted deleted?

2023-04-04 Thread Jeff Jirsa
ves fighting with compaction to make sure we don't run out > of space. > Will open a ticket, thanks. > > > On Wed, Apr 5, 2023 at 12:10 AM Jeff Jirsa wrote: > >> If you restart the node, it'll process/purge those compaction logs on >> startup, but you want them to purge/pro

Re: When are sstables that were compacted deleted?

2023-04-04 Thread Jeff Jirsa
If you restart the node, it'll process/purge those compaction logs on startup, but you want them to purge/process now. I genuinely dont know when the tidier runs, but it's likely the case that you're too busy compaction to purge (though I don't know what exactly "too busy" means). Since you're

Re: Reads not returning data after adding node

2023-04-03 Thread Jeff Jirsa
detool decommission on the node instead. On 03/04/2023 16:32, Jeff Jirsa wrote: FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guar

Re: Reads not returning data after adding node

2023-04-03 Thread Jeff Jirsa
FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission. On Mon, Apr 3, 2023 at 6:47 AM

Re: Understanding rack in cassandra-rackdc.properties

2023-04-03 Thread Jeff Jirsa
As long as the number of racks is already at/above the number of nodes / replication factor, it's gonna be fine. Where it tends to surprise people is if you have RF=3 and either 1 or 2 racks, and then you add a third, that third rack gets one copy of "all" of the data, so you often run out of

Re: Reads not returning data after adding node

2023-04-02 Thread Jeff Jirsa
Just run nodetool rebuild on the new node If you assassinate it now you violate consistency for your most recent writes On Apr 2, 2023, at 10:22 PM, Carlos Diaz wrote:That's what nodetool assassinte will do.On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote:Is it

Re: Reads not returning data after adding node

2023-04-02 Thread Jeff Jirsa
Looks like it joined with no data. Did you set auto_bootstrap to false? Or does the node think it’s a seed? You want to use “nodetool rebuild” to stream data to that host. You can potentially end the production outage / incident by taking the host offline, or making it less likely to be

Re: Nodetool command to pre-load the chunk cache

2023-03-21 Thread Jeff Jirsa
We serialize the other caches to disk to avoid cold-start problems, I don't see why we couldn't also serialize the chunk cache? Seems worth a JIRA to me. Until then, you can probably use the dynamic snitch (badness + severity) to route around newly started hosts. I'm actually pretty surprised

Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Jeff Jirsa
I described something roughly similar to this a few years ago on the list. The specific chain you're describing isn't one I've thought about before, but if you open a JIRA for tracking and attribution, I'll ask some folks to take a peek at it. On Thu, Mar 9, 2023 at 10:57 AM Inès Potier wrote:

Re: Replacing node w/o bootstrapping (streaming)?

2023-02-09 Thread Jeff Jirsa
You don’t have to do anything else. Just use smart rsync flags (including delete). It’ll work fine just the way you described. No special start args. No replacement flag Be sure you rsync the commitlog directory too . Flush and drain to be extra safe > On Feb 9, 2023, at 6:42 PM, Max

Re: Deletions getting omitted

2023-02-04 Thread Jeff Jirsa
While you'd expect only_purge_repaired_tombstones:true to be sufficient, your gc_grace_secnds of 1 hour is making you unusually susceptible to resurrecting data. (To be clear, you should be safe to do this, but if there is a bug hiding in there somewhere, your low gc_grace_seconds will make it

Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Jeff Jirsa
Those hosts are likely sending streams. If you do `nodetool netstats` on the replicas of the node you're removing, you should see byte counters and file counters - they should all be incrementing. If one of them isnt incremening, that one is probably stuck. There's at least one bug in 4.1 that

Re: Failed disks - correct procedure

2023-01-16 Thread Jeff Jirsa
Prior to cassandra-6696 you’d have to treat one missing disk as a failed machine, wipe all the data and re-stream it, as a tombstone for a given value may be on one disk and data on another (effectively redirecting data) So the answer has to be version dependent, too - which version were you

Re: Cassandra 4.0.7 - issue - service not starting

2022-12-08 Thread Jeff Jirsa
What version of java are you using? On Thu, Dec 8, 2022 at 8:07 AM Amit Patel via user < user@cassandra.apache.org> wrote: > Hi, > > > > I have installed cassandra-4.0.7-1.noarch - repo ( baseurl= > https://redhat.cassandra.apache.org/40x/noboolean/) on Redhat 7.9. > > > > We have configured

Re: Unable to gossip with peers when starting cluster

2022-11-09 Thread Jeff Jirsa
When you say you configured them to talk to .0.31 as a seed, did you do that by changing the yaml? Was 0.9 ever a seed before? I expect if you start 0.7 and 0.9 at the same time, it all works. This looks like a logic/state bug that needs to be fixed, though. (If you're going to upgrade, usually

Re: concurrent sstable read

2022-10-25 Thread Jeff Jirsa
Sequentially, and yes - for some definition of "directly" - but not just because it's sequential, but also because each sstable has cost in reading (e.g. JVM garbage created when you open/seek that has to be collected after the read) On Tue, Oct 25, 2022 at 8:27 AM Grzegorz Pietrusza wrote: >

Re: Doubts on multiple filter design in cassandra

2022-10-16 Thread Jeff Jirsa
The limit only bounds what you return not what you scan On Oct 3, 2022, at 10:56 AM, Regis Le Bretonnic wrote:Hi...We do the same (even if a lot of people will say it's bad and that you shouldn't...) with a "allow filtering" BUT ALWAYS  WITHIN A PARTITION AND WITH A LIMIT CLAUSE TO AVOID A FULL

Re: TWCS recommendation on number of windows

2022-09-28 Thread Jeff Jirsa
So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30 days of retention. In that application, we had tested 12h windows, 24h windows, and 7 day windows, and eventually settled on 24h windows because that balanced factors like sstable size, sstables-per-read, and expired data

Re: Cassandra GC tuning

2022-09-20 Thread Jeff Jirsa
news.   Thanks a lot for your valuable input.   BR MK From: Jeff Jirsa Sent: Monday, September 19, 2022 20:06 To: user@cassandra.apache.org; Michail Kotsiouros Subject: Re: Cassandra GC tuning   https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those

Re: Cassandra GC tuning

2022-09-19 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa wrote: > Snapshots are probably actually caused by a spike in disk IO and disk > latency, not GC (you'll see longer STW pauses as y

Re: Cassandra GC tuning

2022-09-19 Thread Jeff Jirsa
Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning. There's a patch somewhere to throttle

Re: Local reads metric

2022-09-18 Thread Jeff Jirsa
Yes > On Sep 17, 2022, at 10:46 PM, Gil Ganz wrote: > >  > Hey > Do reads that come from a read repair are somehow counted as part of the > local read metric? > i.e > org.apache.cassandra.metrics.Table... : > ReadLatency.1m_rate > > Version is 4.0.4 > > Gil

Re: TimeWindowCompactionStrategy Operational Concerns

2022-09-15 Thread Jeff Jirsa
If you were able to generate old data offline, using something like the CQLSSTableWriter class, you can add that to the cluster (either via streaming or nodetool import), that would maintain the TWCS invariant. That said, with https://issues.apache.org/jira/browse/CASSANDRA-13418 , IF you're

Re: Bootstrap data streaming order

2022-09-12 Thread Jeff Jirsa
per host). If you're using rack aware (or in AWS, AZ-aware) snitches, it's also influenced by the number of hosts in the rack. On Mon, Sep 12, 2022 at 7:16 AM Jeff Jirsa wrote: > A new node joining will receive (replication factor) streams for each > token it has. If you use single

Re: Bootstrap data streaming order

2022-09-12 Thread Jeff Jirsa
A new node joining will receive (replication factor) streams for each token it has. If you use single token and RF=3, three hosts will send data at the same time (the data sent is the “losing” replica of the data based on the next/new topology that will exist after the node finishes

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
not a practical response as any business is > unlikely to be spending speculative money. > > > > *From:* Jeff Jirsa > *Sent:* Tuesday, July 12, 2022 4:43 PM > *To:* cassandra > *Cc:* Bowen Song > *Subject:* Re: Adding nodes > > > > EXTERNAL &g

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
On Tue, Jul 12, 2022 at 7:27 AM Marc Hoppins wrote: > > I was asking the questions but no one cared to answer. > This is probably a combination of "it is really hard to answer a question with insufficient data" and your tone. Nobody here gets paid to help you solve your company's problems

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
shown on the joining node > will remain as joining unless the steaming process has failed. > > The node state is propagated between nodes via gossip, and there may be a > delay before all existing nodes agree on the fact that the joining node is > no longer in the cluster. Within that d

Re: Adding nodes

2022-07-08 Thread Jeff Jirsa
Having a node UJ but not sending/receiving other streams is an invalid state (unless 4.0 moved the streaming data out of netstats? I'm not 100% sure, but I'm 99% sure it should be there). It likely stopped the bootstrap process long ago with an error (which you may not have seen), and is running

Re: Adding nodes

2022-07-07 Thread Jeff Jirsa
What version are you using? When you run `nodetool netstats` on the joining node, what is the output? How much data is there per node (presumably more than 86G)? On Thu, Jul 7, 2022 at 7:49 AM Marc Hoppins wrote: > Hi all, > > Cluster of 2 DC and 24 nodes > > DC1 (RF3) = 12 nodes, 16 tokens

Re: Query around Data Modelling -2

2022-06-30 Thread Jeff Jirsa
How are you running repair? -pr? Or -st/-et? 4.0 gives you real incremental repair which helps. Splitting the table won’t make reads faster. It will increase the potential parallelization of compaction. > On Jun 30, 2022, at 7:04 AM, MyWorld wrote: > >  > Hi all, > > Another query around

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
This is assuming each row is like … I dunno 10-1000 bytes. If you’re storing like a huge 1mb blob use two tables for sure. > On Jun 22, 2022, at 9:06 PM, Jeff Jirsa wrote: > >  > > Ok so here’s how I would think about this > > The writes don’t matter. (There’s a ti

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
rows per partition. But what if I started storing > 2-4k rows per partition. However total partition size is still under 100 MB > >> On Thu, Jun 23, 2022, 7:18 AM Jeff Jirsa wrote: >> How many rows per partition in each model? >> >> >> > On Jun 22, 2

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
How many rows per partition in each model? > On Jun 22, 2022, at 6:38 PM, MyWorld wrote: > >  > Hi all, > > Just a small query around data Modelling. > Suppose we have to design the data model for 2 different use cases which will > query the data on same set of (partion+clustering key). So

Re: Configuration for new(expanding) cluster and new admins.

2022-06-20 Thread Jeff Jirsa
One of the advantages of faster streaming in 4.0+ is that it’s now very much viable to do this entirely with bootstraps and decoms in the same DC, when you have use cases where you can’t just change DC names Vnodes will cause more compaction than single token, but you can just add in all the

Re: Configuration for new(expanding) cluster and new admins.

2022-06-15 Thread Jeff Jirsa
You shouldn't need to change num_tokens at all. num_tokens helps you pretend your cluster is a bigger than it is and randomly selects tokens for you so that your data is approximately evenly distributed. As you add more hosts, it should balance out automatically. The alternative to num_tokens is

Re: Cassandra 3.0 upgrade

2022-06-13 Thread Jeff Jirsa
The versions with caveats should all be enumerated in https://github.com/apache/cassandra/blob/cassandra-3.0/NEWS.txt The biggest caveat was 3.0.14 (which had the fix for cassandra-13004), which you're already on. Personally, I'd qualify exactly one upgrade, and rather than doing 3 different

Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Jeff Jirsa
This deserves a JIRA ticket please. (I assume the sending host is randomly choosing the bad IP and blocking on it for some period of time, causing other tasks to pile up, but it should be investigated as a regression). On Tue, Jun 7, 2022 at 7:52 AM Gil Ganz wrote: > Yes, I know the issue

Re: Malformed IPV6 address

2022-04-26 Thread Jeff Jirsa
Oof. From which version did you upgrade? I would try: > export _JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true" There's a chance that fixes it (for an unpleasant reason). Did you get a specific stack trace / log message at all? or just that error? On Tue, Apr 26, 2022 at 1:47 PM Joe

Re: about the performance of select * from tbl

2022-04-26 Thread Jeff Jirsa
Yes, you CAN change the fetch size to adjust how many pages of results are returned. But, if you have a million rows, you may still do hundreds or thousands of queries, one after the next. Even if each is 1ms, it's going to take a long time. What Dor suggested is generating a number of SELECT

Re: sstables changing in snapshots

2022-03-22 Thread Jeff Jirsa
ar 18, 2022, at 12:15 PM, James Brown wrote: >> >> This in 4.0.3 after running nodetool snapshot that we're seeing sstables >> change, yes. >> >> James Brown >> Infrastructure Architect @ easypost.com >> >> >> On 2022-03-18 at 12:06:00, Jeff

Re: sstables changing in snapshots

2022-03-18 Thread Jeff Jirsa
This is nodetool snapshot yes? 3.11 or 4.0? In versions prior to 3.0, sstables would be written with -tmp- in the name, then renamed when complete, so an sstable definitely never changed once it had the final file name. With the new transaction log mechanism, we use one name and a transaction log

Re: Gossips pending task increasing, nodes are DOWN

2022-03-17 Thread Jeff Jirsa
This release is from Sep 2016 (5.5 years ago) and has no fixes applied to it since. There are likely MANY issues with that version. On Thu, Mar 17, 2022 at 9:07 AM Jean Carlo wrote: > Hello, > > After some restart, we go a list of nodes unreachable. These nodes are > being seen as DOWN for the

Re: Cassandra Management tools?

2022-03-01 Thread Jeff Jirsa
Most teams are either using things like ansible/python scripts, or have bespoke infrastructure. Some of what you're describing is included in the intent of the `cassandra-sidecar` project: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 Goals We target two main

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
We don't HAVE TO remove the Config.java entry - we can mark it as deprecated and ignored and remove it in a future version (and you could update Config.java to log a message about having a deprecated config option). It's a much better operator experience: log for a major version, then remove in

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
Accidentally dropped dev@, so adding back in the dev list, with the hopes that someone on the dev list helps address this. On Fri, Feb 11, 2022 at 2:22 PM Jeff Jirsa wrote: > That looks like https://issues.apache.org/jira/browse/CASSANDRA-17132 + > https://github.com/apache/cassandra/

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
That looks like https://issues.apache.org/jira/browse/CASSANDRA-17132 + https://github.com/apache/cassandra/commit/b6f61e850c8cfb1f0763e0f15721cde8893814b5 I suspect this needs to be reverted, at least in 4.0.x, and it definitely deserved a NEWS.txt entry (and ideally some period of

Re: Running enablefullquerylog crashes cassandra

2022-02-06 Thread Jeff Jirsa
That looks like a nodetool stack - can you check the Cassandra log for corresponding error? > On Feb 6, 2022, at 12:52 AM, Gil Ganz wrote: > >  > Hey > I'm trying to enable full query log on cassandra 4.01 node and it's causing > cassandra to shutdown > > nodetool enablefullquerylog --path

Re: about memory problem in write heavy system..

2022-01-07 Thread Jeff Jirsa
3.11.4 is a very old release, with lots of known bugs. It's possible the memory is related to that. If you bounce one of the old nodes, where does the memory end up? On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim wrote: > > Looking at the memory usage chart, it seems that the physical memory usage

Re: Node failed after drive failed

2021-12-11 Thread Jeff Jirsa
Likely lost (enough of) the system key space on that disk so the data files indicating the host was in the cluster are missing and the host tried to rebootstrap > On Dec 11, 2021, at 12:47 PM, Bowen Song wrote: > >  > Hi Joss, > > To unsubscribe from this mailing list, please send an

Re: Which source replica does rebuild stream from?

2021-11-25 Thread Jeff Jirsa
ding > topologies are near-identical (DC1: A/B/C, DC2: A/B/C), and reads are > performed at LOCAL_QUORUM, while writes are done at EACH_QUORUM or ALL. > > Thanks, > Sam > >> On Thu, Nov 25, 2021 at 9:38 AM Jeff Jirsa wrote: >> The risk is not negligible if you expect stri

Re: Which source replica does rebuild stream from?

2021-11-25 Thread Jeff Jirsa
The risk is not negligible if you expect strictly correct responses The only way to do this correctly is very, very labor intensive at the moment, and it requires repair between rebuilds and incrementally adding replicas such that you don’t violate consistency If you give me the starting

Re: Cross DC replication failing

2021-11-13 Thread Jeff Jirsa
> On Nov 13, 2021, at 10:25 AM, Inquistive allen wrote: > >  > Hello team, > Greetings. > > Simple question > > Using Cassandra 3.0.8 > Writing to DC-A using local_quorum > Reading the same data from a DC-B using local quorum. > > It succeeds for a table and fails for other. > Data

Re: One big giant cluster or several smaller ones?

2021-11-12 Thread Jeff Jirsa
t see any major issue. > > > On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa wrote: > >> Most people are better served building multiple clusters and spending >> their engineering time optimizing for maintaining multiple clusters, vs >> spending their engineering time learning ho

Re: One big giant cluster or several smaller ones?

2021-11-12 Thread Jeff Jirsa
Most people are better served building multiple clusters and spending their engineering time optimizing for maintaining multiple clusters, vs spending their engineering time learning how to work around the sharp edges that make large shared clusters hard. Large multi-tenant clusters give you less

Re: Cassandra Delete Query Doubt

2021-11-10 Thread Jeff Jirsa
This type of delete - which doesnt supply a user_id, so it's deleting a range of rows - creates what is known as a range tombstone. It's not tied to any given cell, as it covers a range of cells, and supersedes/shadows them when merged (either in the read path or compaction path). On Wed, Nov

Re: How does a node decide where each of its vnodes will be replicated to?

2021-11-08 Thread Jeff Jirsa
read: If each node does not >>>replicate all its vnodes to the same 2 nodes (assume RF=2), then how does >>>it decide where each of its vnode will be replicated to? >>> >>> Maybe the answer to #2 is apparent in #1 answer. >>> But I would reall

Re: How does a node decide where each of its vnodes will be replicated to?

2021-11-08 Thread Jeff Jirsa
ode does not >replicate all its vnodes to the same 2 nodes (assume RF=2), then how does >it decide where each of its vnode will be replicated to? > > Maybe the answer to #2 is apparent in #1 answer. > But I would really appreciate if someone can help me understand the above. &

Re: How does a node decide where each of its vnodes will be replicated to?

2021-11-08 Thread Jeff Jirsa
Vnodes are implemented by giving a single process multiple tokens. Tokens ultimately determine which data lives on which node. When you hash a partition key, it gives you a token (let's say 570). The 3 processes that own token 57 are the next 3 tokens in the ring ABOVE 570, so if you had A = 0 B

Re: 4.0.1 - adding a node

2021-10-28 Thread Jeff Jirsa
I think you started at 4930 and ended at 5461, difference of 530 (which is the new host) If you run `nodetool cleanup` on every other node in the cluster, you likely drop back down close to 4931 again. On Thu, Oct 28, 2021 at 12:04 PM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > I

Re: Tombstones? 4.0.1

2021-10-25 Thread Jeff Jirsa
unusable table. I'm using Cassandra > to de-duplicate data and that's not a good use case for it. > > -Joe > On 10/25/2021 6:51 PM, Jeff Jirsa wrote: > > The tombstone threshold is "how many tombstones are encountered within a > single read command", and the defau

Re: Tombstones? 4.0.1

2021-10-25 Thread Jeff Jirsa
The tombstone threshold is "how many tombstones are encountered within a single read command", and the default is something like 100,000 ( https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1293-L1294 ) Deletes are not forbidden, but you have to read in such a way that you touch

Re: How to find traffic profile per client on a Cassandra server?

2021-10-24 Thread Jeff Jirsa
Table level metrics ? > On Oct 24, 2021, at 8:54 PM, S G wrote: > >  > Hello, > > We recently faced an issue recently where the read traffic on a big Cassandra > cluster shot up several times (think more than 20 times). > > However, the client team denies sending any huge load and they

Re: Single node slowing down queries in a large cluster

2021-10-17 Thread Jeff Jirsa
(non-percentile based) if it also > mandates the selection of a different server in the retry. > > Is any kind of speculative retry turned on by default ? > > > >> On Wed, Oct 13, 2021 at 2:33 PM Jeff Jirsa wrote: >> Some random notes, not necessarily going to h

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
in-memory should be in sync? > > On Fri, Oct 15, 2021 at 3:38 PM Jeff Jirsa wrote: > >> Heap dumps + filesystem inspection + SELECT from schema tables. >> >> >> On Fri, Oct 15, 2021 at 3:28 PM Tom Offermann >> wrote: >> >>> Interesting! >&

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
ould it help to > force a sync before running an `ALTER KEYSPACE` schema change? > > On Fri, Oct 15, 2021 at 3:08 PM Jeff Jirsa wrote: > >> I would not expect an ALTER KEYSPACE to introduce a divergent CFID, that >> usually happens during a CREATE TABLE. With no other evidence o

Re: Schema collision results in multiple data directories per table

2021-10-15 Thread Jeff Jirsa
es`. Why do these table IDs > normally remain unchanged? What caused new ones to be generated in the > error case I described? > > --Tom > > On Wed, Oct 13, 2021 at 10:35 AM Jeff Jirsa wrote: > >> I've described this race a few times on the list. It is very very >> dangero

Re: Single node slowing down queries in a large cluster

2021-10-13 Thread Jeff Jirsa
Some random notes, not necessarily going to help you, but: - You probably have vnodes enable, which means one bad node is PROBABLY a replica of almost every other node, so the fanout here is worse than it should be, and - You probably have speculative retry on the table set to a percentile. As the

Re: Stop long running queries in Cassandra 3.11.x or Cassandra 4.x

2021-10-13 Thread Jeff Jirsa
deed true. > Thanks for the help, > > On Wed, Oct 13, 2021 at 11:26 AM Jeff Jirsa wrote: > >> The default is true: >> >> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1000 >> >> There is no equivalent to `alter system kill session`,

Re: Stop long running queries in Cassandra 3.11.x or Cassandra 4.x

2021-10-13 Thread Jeff Jirsa
t; all the replica nodes have stopped processing that specific query too? >>> Or is it just the coordinator node that has stopped waiting for the >>> replicas to return response? >>> >>> On Tue, Oct 12, 2021 at 10:12 AM Jeff Jirsa wrote: >>> >>>

Re: Schema collision results in multiple data directories per table

2021-10-13 Thread Jeff Jirsa
I've described this race a few times on the list. It is very very dangerous to do concurrent table creation in cassandra with non-determistic CFIDs. I'll try to describe it quickly right now: - Imagine you have 3 hosts, A B and C You connect to A and issue a "CREATE TABLE ... IF NOT EXISTS". A

Re: Stop long running queries in Cassandra 3.11.x or Cassandra 4.x

2021-10-12 Thread Jeff Jirsa
t; Cassandra servers (co-ordinator, replicas etc) ? > > On Tue, Oct 12, 2021 at 10:00 AM Jeff Jirsa wrote: > >> The read and write timeout values do this today. >> >> >> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L920-L943 >&g

Re: Stop long running queries in Cassandra 3.11.x or Cassandra 4.x

2021-10-12 Thread Jeff Jirsa
The read and write timeout values do this today. https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L920-L943 On Tue, Oct 12, 2021 at 9:53 AM S G wrote: > Hello, > > Is there a way to stop long running queries in Cassandra (versions 3.11.x > or 4.x) ? > The use-case is to have

Re: Trouble After Changing Replication Factor

2021-10-12 Thread Jeff Jirsa
The most likely explanation is that repair failed and you didnt notice. Or that you didnt actually repair every host / every range. Which version are you using? How did you run repair? On Tue, Oct 12, 2021 at 4:33 AM Isaeed Mohanna wrote: > Hi > > Yes I am sacrificing consistency to gain

Re: TTL and disk space releasing

2021-10-06 Thread Jeff Jirsa
I think this is a bit extreme. If you know that 100% of all queries that write to the table include a TTL, not having a TTL on the table is just fine. You just need to ensure that you always write correctly. On Wed, Oct 6, 2021 at 8:57 AM Bowen Song wrote: > TWCS without a table TTL is unlikely

Re: ShortReadPartitionsProtection

2021-09-17 Thread Jeff Jirsa
Short read protection is a feature added in 3.0 to work around a possible situation in 2.1 where we could fail to return all rows in a result The basic premise is that when you read, we ask for the same number of rows from all of the replicas involved in the query. It’s possible, with the

Re: TWCS on Non TTL Data

2021-09-14 Thread Jeff Jirsa
now. > > > When changing the compaction strategy via JMX, do I need to issue the > alter table command at the end so it will be reflected in the schema or is > it taking care of automatically? (I am using cassandra 3.11.11) > > > At the end, yes. > Thanks a lot for your

  1   2   3   4   5   6   7   8   9   10   >