Re: Compaction task priority

2022-09-02 Thread Jim Shaw
if capacity allowed,  increase compaction_throughput_mb_per_sec as 1st
tuning,  and if still behind, increase concurrent_compactors as 2nd tuning.

Regards,

Jim

On Fri, Sep 2, 2022 at 3:05 AM onmstester onmstester via user <
user@cassandra.apache.org> wrote:

> Another thing that comes to my mind: increase minimum sstable count to
> compact from 4 to 32 for the big table that won't be read that much,
> although you should watch out for too many sstables count.
>
> Sent using Zoho Mail 
>
>
>
>  On Fri, 02 Sep 2022 11:29:59 +0430 *onmstester onmstester via user
> >* wrote ---
>
> I was there too! and found nothing to work around it except stopping
> big/unnecessary compactions manually (using nodetool stop) whenever they
> appears by some shell scrips (using crontab)
>
> Sent using Zoho Mail 
>
>
>
>  On Fri, 02 Sep 2022 10:59:22 +0430 *Gil Ganz  >* wrote ---
>
>
>
>
> Hey
> When deciding which sstables to compact together, how is the priority
> determined between tasks, and can I do something about it?
>
> In some cases (mostly after removing a node), it takes a while for
> compactions to keep up with the new data the came from removed nodes, and I
> see it is busy on huge compaction tasks, but in the meantime a lot of small
> sstables are piling up (new data that is coming from the application, so
> read performance is not good, new data is scattered in many sstables, and
> probably combining big sstables won't help reduce fragmentation as much (I
> think).
>
> Another thing that comes to mind, is perhaps I have a table that is very
> big, but not being read that much, would be nice to have other tables have
> higher compaction priority (to help in a case like I described above).
>
> Version is 4.0.4
>
> Gil
>
>
>
>


Re: Hints not being sent from 3.0 to 4.0?

2022-08-23 Thread Jim Shaw
Is it over max hint window ?  if over, better to do a full repair.
check table system.hints,  do you see rows ?

As I remember, during upgrade, transactions will store in hints until other
cluster have done upgrade, so for safety, change default 3 hours hint
window to long time just before starting upgrade.

Regards,

Jim


On Tue, Aug 23, 2022 at 8:33 AM Morten A. Iversen via user <
user@cassandra.apache.org> wrote:

> Hi,
>
> We are currently in the process of upgrading our environment from
> 3.0.27 to 4.0.4. However I see some issues with hints not being sent
> from v3 nodes to v4 nodes.
>
> We have a test environment with 2DCs, we are currently writing to DC1
> and DC2 have been upgraded from version 3.0.27 -> 4.0.4
>
> After the upgrade I see that all the v3 nodes in DC1 have hint files
> stored for all the upgraded nodes in DC2. These files where created
> while the nodes in DC2 was upgraded, however they do not disappear and
> there is nothing in the logs mentioning anything about these files.
>
> Does this mean that data written while a node is being upgraded is not
> replicated to that node?
>
> Hint related configs:
>hinted_handoff_enabled: true
>max_hint_window_in_ms: 17280 # 48
>hourshinted_handoff_throttle_in_kb: 10240
>max_hints_delivery_threads: 8
>hints_directory: /var/lib/cassandra/hints
>hints_flush_period_in_ms: 1
>max_hints_file_size_in_mb: 12
>
> Regards
> Morten
>


Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-21 Thread Jim Shaw
Though it is not required to run upgradesstables, but upgradesstables -a
will re-write the file to kick out tombstones, in sizeTieredcompaction, the
largest files may stay a long time to wait for the next compaction to
kick out tombstones.
So it really depends,  to run it or not,  usually upgrades have a change
window, applications may be no load or less load, why don't take the chance
to run it.

Regards,

Jim

On Tue, Aug 16, 2022 at 3:17 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thank you
>
> On Tue, Aug 16, 2022 at 11:48 AM C. Scott Andreas 
> wrote:
>
>> No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1
>> SSTables incurred a performance hit).
>>
>> Many users of Cassandra don't run upgradesstables after 3.x -> 4.x
>> upgrades at all. It's not necessary to run until a hypothetical future time
>> if/when support for reading Cassandra 3.x SSTables is removed from
>> Cassandra. One of the most common reasons to avoid running upgradesstables
>> is because doing so causes 100% churn of the data files, meaning your
>> backup processes will need to upload a full copy of the data. Allowing
>> SSTables to organically churn into the new version via compaction avoids
>> this.
>>
>> If you're upgrading from 3.x to 4.x, don't feel like you have to - but it
>> does avoid the need to run upgradesstables in a hypothetical distant future.
>>
>> – Scott
>>
>> On Aug 16, 2022, at 6:32 AM, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> wrote:
>>
>>
>> Thank you Erick,
>>
>> > it is going to be single-threaded by default so it will take a while to
>> get through all the sstables on dense nodes
>> Is there any downside if the upgradesstables take longer (example 1-2
>> days), other than I/O?
>>
>> Also when is the upgradesstable get triggered? after every node is
>> upgraded or it will kick in only when all the nodes in the cluster upgraded
>> to 4.0.x?
>>
>> On Tue, Aug 16, 2022 at 2:12 AM Erick Ramirez 
>> wrote:
>>
>>> As convenient as it is, there are a few caveats and it isn't a silver
>>> bullet. The automatic feature will only kick in if there are no other
>>> compactions scheduled. Also, it is going to be single-threaded by default
>>> so it will take a while to get through all the sstables on dense nodes.
>>>
>>> In contrast, you'll have a bit more control if you manually upgrade the
>>> sstables. For example, you can schedule the upgrade during low traffic
>>> periods so reads are not competing with compactions for IO. Cheers!
>>>


>>


Re: cell vs row timestamp tie resolution

2022-08-21 Thread Jim Shaw
Andrey:
cassandra every cell has a timestamp, select writetime (..) can see
the timestamp,
cassandra merge cells when compaction,  when read,  sort by timestamp.
for you example, if you left pad the writetime to column value (writetime +
cell value), then sort,  shall return what you see now.

Regards,

Jim


On Tue, Aug 16, 2022 at 10:25 AM Andrey Zapariy 
wrote:

> Hello Cassandra users!
>
> I'm dealing with the unexpected behavior of the tie resolution for the
> same timestamp inserts. At least, unexpected for me.
> The following simple repro under Cassandra 3.11.4 illustrates the question:
>
> CREATE KEYSPACE the_test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '2'}  AND durable_writes = true;
> CREATE TABLE the_test.case (id int, sort int, body text, size int, PRIMARY
> KEY (id, sort)) WITH CLUSTERING ORDER BY (sort ASC);
> INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'foo foo',
> 7) USING TIMESTAMP 1660596312240;
> INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'flap
> flap', 9) USING TIMESTAMP 1660596312240;
>
> After these two inserts I expect that either combination of <'foo foo',11>
> or combination of <'flap flap',9> would survive.
> But the select
> select id, sort, body, size from the_test.case where id=1 and sort=2;
> is giving rather uncomfortable result:
> id | sort | body| size
> +--+-+--
>   1 |2 | foo foo |9
> Essentially, showing that timestamp tie resolution is performed on per
> cell basis, and not on row basis, as I was expecting.
>
> My questions are:
> Am I right about the way Cassandra does resolve timestamp ties?
> Or is there a way to configure Cassandra to perform per row resolution?
>
> Flushing data to sstables and dumping them, suggests that these inserts
> are stored as rows. And, naively thinking, I hope there is a way to make
> the whole row insert to survive.
>
>
>


Re: Understanding multi region read query and latency

2022-08-09 Thread Jim Shaw
Raphael:
   Have you found  root cause ? If not, here are a few tips, based on what
I experienced before, but may not  be same as your case, just hope it is
helpful.
1) app side called wrong code module

get the cql from system.prepared_statements

cql statement is helpful to developers to search their code and find issue
parts. In my case,  was function disabled but actually not, when they see
cql statement, they realized.

2) app side code query immediately after write

from the trace, you have read time,  get this row write time by

select writetime ("any non-key column here") from "table_name_here" where
...;

if read time is too close to write time,  ask developers to add a sleep in
code.

while earlier phase of projects using cassandra, developers still get used
to rdbms style, forget cassandra is distributed database (i.e. in code, 10
cql statements in a logic order, they assume they will be executed in
order, but actually in distributed system, no order, last line in code may
execute 1st in cassandra cluster).

3) duplicate the case
use copy tables, testing data, by comparing the traces, duplicate the case,
so know your debug direction right or not right.


Regards,


Jim

On Sun, Aug 7, 2022 at 5:14 PM Stéphane Alleaume 
wrote:

> You're right too, this option is not new, sorry.
>
> Is this option can be useful ?
>
>
> Le dim. 7 août 2022, 22:18, Bowen Song via user 
> a écrit :
>
>> Do you mean "nodetool settraceprobability"? This is not exactly new, I
>> remember it was available on Cassandra 2.x.
>> On 07/08/2022 20:43, Stéphane Alleaume wrote:
>>
>> I think perhaps you already know but i read you can now trace only a % of
>> all queries, i will look to retrieve the name of this fonctionnality (in
>> new Cassandra release).
>>
>> Hope it will help
>> Kind regards
>> Stéphane
>>
>>
>> Le dim. 7 août 2022, 20:26, Raphael Mazelier  a
>> écrit :
>>
>>> > "Read repair is in the blocking read path for the query, yep"
>>>
>>> OK interesting. This is not what I understood from the documentation.
>>> And I use localOne level consistency.
>>>
>>> I enabled tracing (see in the attachment of my first msg)/ but I didn't
>>> see read repair in the trace (and btw I tried to completely disable it on
>>> my table setting both read_repair_chance and local_dc_read_repair_chance to
>>> 0).
>>>
>>> The problem when enabling trace in cqlsh is that I only get slow result.
>>> For having fast answer I need to iterate faster on my queries.
>>>
>>> I can provide again trace for analysis. I got something more readable in
>>> python.
>>>
>>> Best,
>>>
>>> --
>>>
>>> Raphael
>>>
>>>
>>> On 07/08/2022 19:30, C. Scott Andreas wrote:
>>>
>>> > but still as I understand the documentation the read repair should
>>> not be in the blocking path of a query ?
>>>
>>> Read repair is in the blocking read path for the query, yep. At quorum
>>> consistency levels, the read repair must complete before returning a result
>>> to the client to ensure the data returned would be visible on subsequent
>>> reads that address the remainder of the quorum.
>>>
>>> If you enable tracing - either for a single CQL statement that is
>>> expected to be slow, or probabilistic from the server side to catch a slow
>>> query in the act - that will help identify what’s happening.
>>>
>>> - Scott
>>>
>>> On Aug 7, 2022, at 10:25 AM, Raphael Mazelier 
>>>  wrote:
>>>
>>> 
>>>
>>> Nope. And what really puzzle me is in the trace we really show the
>>> difference between queries. The fast queries only request read from one
>>> replicas, while slow queries request from multiple replicas (and not only
>>> local to the dc).
>>> On 07/08/2022 14:02, Stéphane Alleaume wrote:
>>>
>>> Hi
>>>
>>> Is there some GC which could affect coordinarir node ?
>>>
>>> Kind regards
>>> Stéphane
>>>
>>> Le dim. 7 août 2022, 13:41, Raphael Mazelier  a
>>> écrit :
>>>
 Thanks for the answer but I was well aware of this. I use localOne as
 consistency level.

 My client connect to a local seeds, then choose a local coordinator (as
 far I can understand the trace log).

 Then for a batch of request I got approximately 98% of request treated
 in 2/3ms in local DC with one read request, and 2% treated by many nodes
 (according to the trace) and then way longer (250ms).

 ?
 On 06/08/2022 14:30, Bowen Song via user wrote:

 See the diagram below. Your problem almost certainly arises from step
 4, in which an incorrect consistency level set by the client caused the
 coordinator node to send the READ command to nodes in other DCs.

 The load balancing policy only affects step 2 and 3, not step 1 or 4.

 You should change the consistency level to LOCAL_ONE/LOCAL_QUORUM/etc.
 to fix the problem.

 On 05/08/2022 22:54, Bowen Song wrote:

 The  DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls which
 Cassandra coordinator node the client sends queries to, not the nodes it
 

Re: Understanding multi region read query and latency

2022-08-05 Thread Jim Shaw
I remember gocql.DataCentreHostFilter was used.  try add it to see whether
will read local DC only in your case ?

Thanks,

James

On Fri, Aug 5, 2022 at 2:40 PM Raphael Mazelier  wrote:

> Hi Cassandra Users,
>
> I'm relatively new to Cassandra and first I have to say I'm really
> impressed by the technology.
>
> Good design and a lot of stuff to understand the underlying (the Oreilly
> book help a lot as well as thelastpickle blog post).
>
> I have an muli-datacenter c* cluster (US, Europe, Singapore) with eight
> node on each (two seeds on each region), two racks on Eu, Singapore, 3 on
> US. Everything deployed in AWS.
>
> We have a keyspace configured with network topology and two replicas on
> every region like this: {'class': 'NetworkTopologyStrategy',
> 'ap-southeast-1': '2', 'eu-west-1': '2', 'us-east-1': '2'}
>
>
> Investigating some performance issue I noticed strange things in my
> experiment:
>
> What we expect is very slow latency 3/5ms max for this specific select
> query. So we want every read to be local the each datacenter.
>
> We configure DCAwareRoundRobinPolicy(local_dc=DC) in python, and the same
> in Go gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC"))
>
> Testing a bit with two short program (I can provide them) in go and python
> I notice very strange result. Basically I do the same query over and over
> with a very limited dataset of id.
>
> The first result were surprising cause the very first query were always
> more than 250ms and after with stressing c* (playing with sleep between
> query) I can achieve a good ratio of query at 3/4 ms (what I expected).
>
> My guess was that long query were somewhat executed not locally (or at
> least imply multi datacenter queries) and short one no.
>
> Activating tracing in my program (like enalbing trace in cqlsh) kindla
> confirm my suspicion.
>
> (I will provide trace in attachment).
>
> My question is why sometime C* try to read not localy? how we can disable
> it? what is the criteria for this?
>
> (btw I'm very not fan of this multi region design for theses very specific
> kind of issues...)
>
> Also side question: why C* is so slow at connection? it's like it's trying
> to reach every nodes in each DC? (we only provide locals seeds however).
> Sometimes it take more than 20s...
>
> Any help appreciated.
>
> Best,
>
> --
>
> Raphael Mazelier
>


Re: Wrong Consistency level seems to be used

2022-07-21 Thread Jim Shaw
My experience to debug this kind of issue is to turn on trace. The nice
thing in cassandra is:
you can turn on trace only on 1 node and with a small percentage, i.e.
nodetool settraceprobability 0.05   --- only run on 1 node.
Hope it helps.

Regards,

James


On Thu, Jul 21, 2022 at 2:50 PM Tolbert, Andy  wrote:

> I'd bet the JIRA that Paul is pointing to is likely what's happening
> here.  I'd look for read repair errors in your system logs or in your
> metrics (if you have easy access to them).
>
> There are  operations that can happen during the course of a  query
> being executed that may happen at different CLs, atomic batch log
> timeouts (CL TWO I think?) and read repair came to my mind (especially
> for CL ALL) that can make the timeout/unavailable exceptions include a
> different CL.   I also remember some DSE features causing this as well
> (rbac, auditing, graph and solr stuff).   In newer versions of C* the
> errors may be more specific or a warning may come along with it
> depending on what is failing.
>
> Thanks,
> Andy
>


Re: Streaming failure Issue

2021-10-07 Thread Jim Shaw
The error message indicates source side stream session failed. if
source side load is heavy, consider reduce some, like stop repair,  etc. If
every trying failed at same file,  may check that file too.   Hope it helps.

On Wed, Oct 6, 2021 at 11:20 PM MyWorld  wrote:

> Hi Jim,
> It's 600 Megabits not MegaBytes. So near around 600/8=75MBps. Also,
> streaming is happening from any 3 nodes at a time.
> However we have tried with default Streaming throughput which is 200
> Megabits per sec (25MBps), but still the same issue. Heap we have setup to
> 8gb on GCP and seems pretty much normal.
>
> On Thu, Oct 7, 2021, 4:05 AM Jim Shaw  wrote:
>
>> I met similar issue before.  What I did was:  reduce Heap size for
>> rebuild,  reduce  streamthroughput.
>> But it depends on version, and your env., may not your case,  just hope
>> it helpful.
>>
>> ps -ef | grep ,  you will see a new java process for rebuild, see what
>> memory size used, if use default, it may be use too much, just export
>> MAX_HEAP_SIZE before nodetool rebuild, it will limit heap size.
>>
>> streamthroughput=600MB/s, if you look by nodetool, or OS level file, or
>> the log,  you will it pull files from all nodes --- that is 5 in your
>> case.  so it will be 3 GB/s ,  on-premise side may not handle it due to
>> firewall setting.
>>
>> Regards,
>> Jim
>>
>> On Tue, Oct 5, 2021 at 8:43 AM MyWorld  wrote:
>>
>>> Logged "nodetool failuredetector" every 5sec. Doesn't seems to be an
>>> issue for  phi_convict_threshold value
>>>
>>> On Tue, Oct 5, 2021 at 4:35 PM Surbhi Gupta 
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> Try to adjust phi_convict_threshold and see if that helps.
>>>> When we did migration from on prim to AWS, this was one of the factor
>>>> to consider.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Tue, Oct 5, 2021 at 4:00 AM MyWorld  wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Need urgent help.
>>>>> We have one Physical Data Center of 5 nodes with 1 TB data on each
>>>>> (Location: Dallas). Currently we are using Cassandra ver 3.0.9. Now we are
>>>>> Adding one more Data Center of 5 nodes(Location GCP-US) and have joined it
>>>>> to the existing one.
>>>>>
>>>>> While running nodetool rebuild command, we are getting following error
>>>>> :
>>>>> On GCP node (where we ran rebuild command) :
>>>>>
>>>>>> ERROR [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,246
>>>>>> StreamSession.java:639 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
>>>>>> Remote peer 192.x.x.x failed stream session.
>>>>>> INFO  [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,266
>>>>>> StreamResultFuture.java:183 - [Stream
>>>>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /192.x.x.x is 
>>>>>> complete
>>>>>
>>>>>
>>>>> On DL source node :
>>>>>
>>>>>> INFO  [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,785
>>>>>> StreamResultFuture.java:183 - [Stream
>>>>>> #66646d30-25a2-11ec-903b-774f88efe725] Session with /34.x.x.x is complete
>>>>>> ERROR [STREAM-OUT-/34.x.x.x] 2021-10-05 15:55:53,785
>>>>>> StreamSession.java:534 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
>>>>>> Streaming error occurred
>>>>>> java.lang.RuntimeException: Transfer of file
>>>>>> /var/lib/cassandra/data/clickstream/glusr_usr_paid_url_mv-3c49c392b35511e9bd0a8f42dfb09617/mc-45676-big-Data.db
>>>>>> already completed or aborted (perhaps session failed?).
>>>>>> at
>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.startTransfer(OutgoingFileMessage.java:120)
>>>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>>> at
>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>>> at
>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>>>>>> ~[apache-cassandra-3.0.9.jar:3.0.9]
>>>>>> at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
>>&g

Re: Streaming failure Issue

2021-10-06 Thread Jim Shaw
I met similar issue before.  What I did was:  reduce Heap size for
rebuild,  reduce  streamthroughput.
But it depends on version, and your env., may not your case,  just hope it
helpful.

ps -ef | grep ,  you will see a new java process for rebuild, see what
memory size used, if use default, it may be use too much, just export
MAX_HEAP_SIZE before nodetool rebuild, it will limit heap size.

streamthroughput=600MB/s, if you look by nodetool, or OS level file, or the
log,  you will it pull files from all nodes --- that is 5 in your case.  so
it will be 3 GB/s ,  on-premise side may not handle it due to
firewall setting.

Regards,
Jim

On Tue, Oct 5, 2021 at 8:43 AM MyWorld  wrote:

> Logged "nodetool failuredetector" every 5sec. Doesn't seems to be an issue
> for  phi_convict_threshold value
>
> On Tue, Oct 5, 2021 at 4:35 PM Surbhi Gupta 
> wrote:
>
>> Hi ,
>>
>> Try to adjust phi_convict_threshold and see if that helps.
>> When we did migration from on prim to AWS, this was one of the factor to
>> consider.
>>
>> Thanks
>>
>>
>> On Tue, Oct 5, 2021 at 4:00 AM MyWorld  wrote:
>>
>>> Hi all,
>>>
>>> Need urgent help.
>>> We have one Physical Data Center of 5 nodes with 1 TB data on each
>>> (Location: Dallas). Currently we are using Cassandra ver 3.0.9. Now we are
>>> Adding one more Data Center of 5 nodes(Location GCP-US) and have joined it
>>> to the existing one.
>>>
>>> While running nodetool rebuild command, we are getting following error :
>>> On GCP node (where we ran rebuild command) :
>>>
 ERROR [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,246
 StreamSession.java:639 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
 Remote peer 192.x.x.x failed stream session.
 INFO  [STREAM-IN-/192.x.x.x] 2021-10-05 15:56:52,266
 StreamResultFuture.java:183 - [Stream
 #66646d30-25a2-11ec-903b-774f88efe725] Session with /192.x.x.x is complete
>>>
>>>
>>> On DL source node :
>>>
 INFO  [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,785
 StreamResultFuture.java:183 - [Stream
 #66646d30-25a2-11ec-903b-774f88efe725] Session with /34.x.x.x is complete
 ERROR [STREAM-OUT-/34.x.x.x] 2021-10-05 15:55:53,785
 StreamSession.java:534 - [Stream #66646d30-25a2-11ec-903b-774f88efe725]
 Streaming error occurred
 java.lang.RuntimeException: Transfer of file
 /var/lib/cassandra/data/clickstream/glusr_usr_paid_url_mv-3c49c392b35511e9bd0a8f42dfb09617/mc-45676-big-Data.db
 already completed or aborted (perhaps session failed?).
 at
 org.apache.cassandra.streaming.messages.OutgoingFileMessage.startTransfer(OutgoingFileMessage.java:120)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at
 org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at
 org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at
 org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:387)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at
 org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:367)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_192]
 WARN  [STREAM-IN-/34.x.x.x] 2021-10-05 15:55:53,786
 StreamResultFuture.java:210 - [Stream
 #66646d30-25a2-11ec-903b-774f88efe725] Stream failed
>>>
>>>
>>> Before starting this rebuild, we have made the following changes:
>>> 1. Set setstreamthroughput to 600 Mb/sec
>>> 2. Set setinterdcstreamthroughput to 600 Mb/sec
>>> 3. streaming_socket_timeout_in_ms is 24 hrs
>>> 4. Disabled autocompaction on GCP node as this was heavily utilising CPU
>>> resource
>>>
>>> FYI, GCP rebuild process starts with data streaming from 3 nodes, and
>>> all fails one by one after streaming for a few hours.
>>> Please help out how to correct this issue.
>>> Is there any other way to rebuild such big data.
>>> We have a few tables with 200 - 400GB of data and some smaller tables.
>>> Also, we have Mviews in our environment
>>>
>>> Regards,
>>> Ashish Gupta
>>>
>>>
>>


Re: Change of Cassandra TTL

2021-10-01 Thread Jim Shaw
If data size not big,  you may try copy primary key values to a file, then
copy back to table, then do compaction.
Both copy and compact may set some throttles. If size not so big, you may
try get partition key values first, then loop partition key values to get
all primary key values to file.


On Tue, Sep 14, 2021 at 6:10 AM raman gugnani 
wrote:

> HI all,
>
> 1. I have a table with default_time_to_live = 31536000 (1 year) . We want
> it to reduce the value to 7884000 (3 months).
> If we alter the table , is there a way to update the existing data?
>
> 1. I have a table without TTL  we want to add TTL  = 7884000 (3 months)
> on the table.
> If we alter the table , is there a way to update the existing data?
>
>
> --
> Raman Gugnani
>


Re: TWCS on Non TTL Data

2021-09-15 Thread Jim Shaw
You may try roll up the data, i.e.  a table only 1 month data, old data
roll up to a table keep a year data.

Thanks,
Jim

On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna  wrote:

> My cluster column is the time series timestamp, so basically sourceId,
> metric type for partition key and timestamp for the clustering key the rest
> of the fields are just values outside of the primary key. Our reads request
> are simply give me values for a time range of a specific sourceId,Metric
> combination. So I am guess that during read the sstables that contain the
> partition key will be found and out of those the ones that are out of the
> range will be excluded, correct?
>
> In practice our queries are up to a month by default, only rarely we fetch
> more when someone is exporting the data or so.
>
>
>
> In reality also we get old data, that is a source will send its
> information late instead of sending it in realtime it will send all last
> month\week\day data at once, in that case I guess the data will end up in
> current bucket, will that affect performance?
>
>
>
> Assuming I start with a  1 week bucket, I could later change the time
> window right?
>
>
>
> Thanks
>
>
>
>
>
> *From:* Jeff Jirsa 
> *Sent:* Tuesday, September 14, 2021 10:35 PM
> *To:* cassandra 
> *Subject:* Re: TWCS on Non TTL Data
>
>
>
> Inline
>
>
>
> On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna  wrote:
>
> Hi Jeff
>
> My data is partitioned by a sourceId and metric, a source is usually
> active up to a year after which there is no additional writes for the
> partition, and reads become scarce, so although this is not an explicit
> time component, its time based, will that suffice?
>
>
>
> I guess it means that a single read may touch a year of sstables. Not
> great, but perhaps not fatal. Hopefully your reads avoid that in practice.
> We'd need the full schema to be very sure (does clustering column include
> month/day? if so, there are cases where that can help exclude sstables)
>
>
>
>
>
> If I use a  week bucket we will be able to serve last few days reads from
> one file and last month from ~5 which is the most common queries, do u
> think doing a months bucket a good idea? That will allow reading from one
> file most of the time but the size of each SSTable will be ~5 times bigger
>
>
>
> It'll be 1-4 for most common (up to 4 for same bucket reads because STCS
> in the first bucket is triggered at min_threshold=4), and 5 max, seems
> reasonable. Way better than the 200 or so you're doing now.
>
>
>
>
>
> When changing the compaction strategy via JMX, do I need to issue the
> alter table command at the end so it will be reflected in the schema or is
> it taking care of automatically? (I am using cassandra 3.11.11)
>
>
>
>
>
> At the end, yes.
>
>
>
> Thanks a lot for your help.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Jeff Jirsa 
> *Sent:* Tuesday, September 14, 2021 4:51 PM
> *To:* cassandra 
> *Subject:* Re: TWCS on Non TTL Data
>
>
>
>
>
>
>
> On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna  wrote:
>
> Hi
>
> I have a table that stores time series data, the data is not TTLed since
> we want to retain the data for the foreseeable future, and there are no
> updates or deletes. (deletes could happens rarely in case some scrambled
> data reached the table, but its extremely rare).
>
> Usually we do constant write of incoming data to the table ~ 5 milion a
> day, mostly newly generated data in the past week, but we also get old data
> that got stuck somewhere but not that often. Usually our reads are for the
> most recent data last month – three. But we do fetch old data as well in a
> specific time period in the past.
>
> Lately we have been facing performance trouble with this table see
> histogram below, When compaction is working on the table the performance
> even drops to 10-20 seconds!!
>
> Percentile  SSTables Write Latency  Read LatencyPartition
> SizeCell Count
>
>   (micros)  (micros)   (bytes)
>
> 50%   215.00 17.08  89970.66
> 1916   149
>
> 75%   446.00 24.60 223875.79
> 2759   215
>
> 95%   535.00 35.43 464228.84
> 8239   642
>
> 98%   642.00 51.01 668489.53
> 24601  1916
>
> 99%   642.00 73.46 962624.93
> 42510  3311
>
> Min 0.00  2.30  10090.81
> 43 0
>
> Max   770.00   1358.102395318.86
> 5839588454826
>
>
>
> As u can see we are scaning hundreds of sstables, turns out we are using
> DTCS  (min:4,max32) , the table folder contains ~33K files  of ~130GB per
> node (cleanup pending after increasing the cluster), And compaction takes a
> very long time to complete.
>
> As I understood DTCS is deprecated so my questions
>
>1. should we switch to TWCS even though our data is 

Re: Service Failed but cassandra runs

2021-08-18 Thread Jim Shaw
you start c* from docker command, right ?  check docker log, may see some
info helpful.

On Wed, Aug 18, 2021 at 8:58 AM FERON Matthieu  wrote:

> Hello you all,
>
>
> I'm trying to set cassandra on a docker container centos7.
>
> When I start the service, it says Failed but I see the proccess in memory.
>
> When I look in /var/run/cassandra for cassandra.pid it's not there.
>
> I've looked on the web and try all fixes I found but none works.
>
> It's the 3.11.6.1 version (It's mandatory, I don't have the choice)
>
> jvm is provided by java-1.8.0-openjdk-1.8.0.262.b10-1
>
> I've activate debug level but don't find any ERROR line
>
>
> Here are the status logs
>
> [root@NYTHIVED01 cassandra]# service cassandra status
> ● cassandra.service - LSB: distributed storage system for structured data
>Loaded: loaded (/etc/rc.d/init.d/cassandra; bad; vendor preset:
> disabled)
>Active: failed (Result: protocol) since Wed 2021-08-18 10:33:32 UTC;
> 29s ago
>  Docs: man:systemd-sysv-generator(8)
>   Process: 8047 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited,
> status=0/SUCCESS)
>
> Aug 18 10:33:31 NYTHIVED01 systemd[1]: Starting LSB: distributed storage
> system for structured data...
> Aug 18 10:33:31 NYTHIVED01 su[8057]: (to cassandra) root on none
> Aug 18 10:33:32 NYTHIVED01 cassandra[8047]: Starting Cassandra: OK
> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Failed to parse PID from file
> /var/run/cassandra/cassandra.pid: Numerical result out of range
> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Failed to start LSB: distributed
> storage system for structured data.
> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Unit cassandra.service entered
> failed state.
> Aug 18 10:33:32 NYTHIVED01 systemd[1]: cassandra.service failed.
>
> Here the process line
> cassand+  7773 1  1 10:28 ?00:00:14
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64/jre/bin/java
> -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemory
> r
>
> Thank you for your help
>
>


Re: WARN on free space across data volumes (false positive)

2021-08-06 Thread Jim Shaw
I will try cd to all directories of data files of this table, do
pwd   get path
df -h "above path"

to see whether all have 2.7 TB ?


Thanks,
Jim

On Fri, Aug 6, 2021 at 6:31 PM Kian Mohageri 
wrote:

> When running a "nodetool scrub" to repair a table, the following warning
> appears:
>
> ---
>
> WARN 22:23:44,257 Only 40.420GiB free across all data volumes. Consider
> adding more capacity to your cluster or removing obsolete snapshots
> ---
>
> However, our storagedir has >2TB free space.
>
> ---
> Filesystem  Size  Used Avail Use% Mounted on
> overlay  47G  5.7G   39G  13% /
> /dev/nvme0n1p9   47G  5.7G   39G  13% /alloc
> /dev/nvme1n14.0T  1.3T  2.7T  34% /srv/var
> ---
>
> I verified that cassandra is running with only a single
> "-Dcassandra.storagedir=/srv/var/data" option, so I am not sure why the
> warning is appearing.
>
> Any help would be appreciated.  Thank you.
>
> Kian
>


Re: Long GC pauses during repair

2021-08-03 Thread Jim Shaw
CMS heap too large will have long GC.  you may try reduce heap on 1 node to
see.  or go GC1 if it is easy way.

Thanks,
Jim

On Tue, Aug 3, 2021 at 3:33 AM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Long GC (1 seconds /2 seconds) pauses seen during repair on the
> coordinator. Running full repair with partition range option. GC collector
> is CMS and heap is 14G. Cluster is 7+7. Cassandra version is 3.11.2.  Not
> much traffic when repair is running. What could be the probable cause of
> long gc pauses? What things should I look into?
>
> Regards
> Manish
>


Re: High memory usage during nodetool repair

2021-08-03 Thread Jim Shaw
I think Erick posted https://community.datastax.com/questions/6947/.
explained very clearly.

We hit same issue only on a huge table when upgrade, and we changed back
after done.
My understanding,  Which option to chose,  shall depend on your user case.
If chasing high performance on a big table, then go default one, and
increase capacity in memory, nowadays hardware is cheaper.

Thanks,
Jim

On Mon, Aug 2, 2021 at 7:12 PM Amandeep Srivastava <
amandeep.srivastava1...@gmail.com> wrote:

> Can anyone please help with the above questions? To summarise:
>
> 1) What is the impact of using mmap only for indices besides a degradation
> in read performance?
> 2) Why does the off heap consumed during Cassandra full repair remains
> occupied 12+ hours after the repair completion and is there a
> manual/configuration driven way to clear that earlier?
>
> Thanks,
> Aman
>
> On Thu, 29 Jul, 2021, 6:47 pm Amandeep Srivastava, <
> amandeep.srivastava1...@gmail.com> wrote:
>
>> Hi Erick,
>>
>> Limiting mmap to index only seems to have resolved the issue. The max ram
>> usage remained at 60% this time. Could you please point me to the
>> limitations for setting this param? - For starters, I can see read
>> performance getting reduced up to 30% (CASSANDRA-8464
>> )
>>
>> Also if you could please shed light on extended questions in my earlier
>> email.
>>
>> Thanks a lot.
>>
>> Regards,
>> Aman
>>
>> On Thu, Jul 29, 2021 at 12:52 PM Amandeep Srivastava <
>> amandeep.srivastava1...@gmail.com> wrote:
>>
>>> Thanks, Bowen, don't think that's an issue - but yes I can try upgrading
>>> to 3.11.5 and limit the merkle tree size to bring down the memory
>>> utilization.
>>>
>>> Thanks, Erick, let me try that.
>>>
>>> Can someone please share documentation relating to internal functioning
>>> of full repairs - if there exists one? Wanted to understand the role of the
>>> heap and off-heap memory separately during the process.
>>>
>>> Also, for my case, once the nodes reach the 95% memory usage, it stays
>>> there for almost 10-12 hours after the repair is complete, before falling
>>> back to 65%. Any pointers on what might be consuming off-heap for so long
>>> and can something be done to clear it earlier?
>>>
>>> Thanks,
>>> Aman
>>>
>>>
>>>
>>
>> --
>> Regards,
>> Aman
>>
>


Re: Number of DCs in Cassandra

2021-07-14 Thread Jim Shaw
Shaurya:
What's the purpose to partise too many data centers ?
RF=3,  is within a center,  you have 3 copies of data.
If you have 3 DCs, means 9 copies of data.
Think about space wasted, Network bandwidth wasted for number of copies.
BTW, Ours just 2 DCs for regional DR.

Thanks,
Jim

On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta 
wrote:

> Hi
>
> Does someone have any suggestions on the maximum number of Data Centers
> which NetworkTopology strategy can have for a keyspace. Not only
> technically but considering performance as well.
> In each Data Center RF is 3.
>
> Thanks!
> --
> Shaurya Gupta
>
>
>