Re: Who wants a free Cassandra t-shirt?

2023-07-21 Thread guo Maxwell
It seems I‘ve never had one… 

Patrick McFadin 于2023年7月22日 周六上午8:48写道:

> We have about another week left on the user survey I posted last week. The
> response has been slow, so it's time to get things in gear.
>
> I found a box of Cassandra t-shirts that will make an excellent thank you
> for anyone filling out the survey. Once the survey window closes, I'll pick
> a random group of emails to receive a shirt. Given the tepid response so
> far, your chances are decent to receive a shirt!
>
> 5-10 minutes. That's all it takes. Promote to your networks and let's get
> some opinions known!
>
> https://forms.gle/KVNd7UmUfcBuoNvF7
>
> Thanks again,
>
> Patrick
>
-- 
you are the apple of my eye !


Re: Query on Token range

2023-06-09 Thread guo Maxwell
I think nodetool info with --token may do some help.

ranju goel  于2023年6月9日周五 15:09写道:

> Hi everyone,
>
> Is there any faster way to calculate the number of token ranges allocated
> to a node
> (x.y.z.w)?
>
> I used the manual way by subtracting the last token with the start token
> shown in the nodetool ring, but it is time consuming.
>
>
>
> x.y.z.w RAC1   UpNormal 88 GiB  100.00%
> -5972602825521846313
> x.y.z.w1   RAC1   UpNormal 87 GiB  100.00%
> -5956172717199559280
>
> Best Regards
> Ranju Jain
>


-- 
you are the apple of my eye !


Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread guo Maxwell
compact ion will just merge duplicate data and remove delete data in this
node .if you add or remove one node for the cluster, I think clean up is
needed. if clean up failed, I think we should come to see the reason.

Runtian Liu  于2023年5月5日周五 06:37写道:

> Hi all,
>
> Is cleanup the sole method to remove data that does not belong to a
> specific node? In a cluster, where nodes are added or decommissioned from
> time to time, failure to run cleanup may lead to data resurrection issues,
> as deleted data may remain on the node that lost ownership of certain
> partitions. Or is it true that normal compactions can also handle data
> removal for nodes that no longer have ownership of certain data?
>
> Thanks,
> Runtian
>


-- 
you are the apple of my eye !


Re: Apache Cassandra Marketing Meeting - 5/26

2022-05-24 Thread guo Maxwell
Make a suggestion, can you consider Asian users in the future? 樂☺


Re: kill session in cassandra cluster

2021-01-07 Thread guo Maxwell
when the query is process in coordinator node, you may just kill the node
,but if query are processed under replicate node ~~~

Jeff Jirsa  于2021年1月7日周四 下午3:24写道:

> There have been a few bugs that lets some really bad queries run for 30+
> minutes in pathological cases, but generally you’re right, and there is not
> a way to interrupt a running query in any existing releases
>
>
> On Jan 6, 2021, at 11:20 PM, Elliott Sims  wrote:
>
> 
> At least by default, Cassandra has pretty short timeouts.  I don't know of
> a way to kill an in-flight query, but by the time you did it would have
> timed out anyways.  I don't know of any way to stop it from repeating other
> than tracking down the source and stopping it.
>
> On Wed, Jan 6, 2021, 5:41 PM David Ni  wrote:
>
>> Hello,Experts!
>>  I want to know if there is a way to kill the session in
>> cassandra cluster,for example,I get session_id from
>> system_traces.sessions:4c9049a0-4fed-11eb-a60d-7f98ffdaf6cd,the session is
>> running with very bad cql which causing bad performance,I need to kill it
>> ASAP,could anyone help,thanks very much!
>>
>>
>>
>>
>

-- 
you are the apple of my eye !


Re: CDC Tools

2020-05-26 Thread guo Maxwell
I have found some project that support parse Commitlog (CDC), such as :
https://github.com/rustyrazorblade/commitlog-viz  this seems wrote by Jon
Haddad , but is  not work yet. and  commitlog extract tool :
https://github.com/carloscm/cassandra-commitlog-extract but it also do some
output to other software. the Debezium I have not yet saw the feature .


Ahmed Eljami  于2020年5月26日周二 下午8:59写道:

> Hi guys,
> I'm looking for a tool that helps me to parse CommitLog (CDC).
>
> I found Debezium https://debezium.io/documentation/reference/1.2/ and I
> want to know  if someone has used it or if you could advise me other
> solutions?
>
> Cheers.
>


-- 
you are the apple of my eye !


Re: Unable to start cassandra on GCP (Google cloud) using public email address

2019-12-18 Thread guo Maxwell
I think this may be that the public address is not able for you to bound.
Check if the network adapter for the address is useable . I think you
should ask the GCP for some details.

Manu Chadha  于2019年12月18日周三 下午5:25写道:

> Hi
>
>
>
> Apologies if this isn’t the right group to ask questions. If it isn’t,
> please let me know wehre I should send such messages.
>
>
>
> I am trying to run cassandra on google cloud and want to use external IP
> to run cassandra. I specified the external address in rpc_address in but
> got error when trying to start cassandra- INFO [main] 2019-12-17
> 19:00:37,251 Server.java:159 - Starting listening for CQL clients on
> /xx.xx.xxx.xx:9042 (unencrypted)... Exception
> (java.lang.IllegalStateException) encountered during startup: Failed to
> bind port 9042 on xx.xx.x.xx. java.lang.IllegalStateException: Failed to
> bind port 9042 on x.x.x.x.x..
>
>
>
>
>
> Would you know what I might be doing wrong?
>
>
>
> Thanks
>
> Manu
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>


-- 
you are the apple of my eye !


Re: Optimal backup strategy

2019-11-28 Thread guo Maxwell
Same topology means the restore node should got the same tokes with the
backup nodes ;
ex : backup
   node1(1/2/3/4/5) node2(6/7/8/9/10)
restore :
  nodea(1/2/3/4/5) nodeb(6/7/8/9/10)
so node1's commitlog can be replay on nodea .

Adarsh Kumar  于2019年11月29日周五 下午2:03写道:

> Thanks Ahu and Hussein,
>
> So my understanding is:
>
>1. Commit log backup is not documented for Apache Cassandra, hence not
>standard. But can be used for restore on the same machine (For taking
>backup from commit_log_dir). If used on other machine(s) has to be in the
>same topology. Can it be used for replacement node?
>2. For periodic backup Snapshot+Incremental backup is the best option
>
>
> Thanks,
> Adarsh Kumar
>
> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell  wrote:
>
>> Hossein is right , But for use , we restore to the same cassandra
>> topology ,So it is usable to do replay .But when restore to the
>> same machine it is also usable .
>> Using sstableloader cost too much time and more storage(though will
>> reduce after  restored)
>>
>> Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:
>>
>>> commitlog backup isn't usable in another machine.
>>> Backup solution depends on what you want to do: periodic backup or
>>> backup to restore on other machine?
>>> Periodic backup is combine of snapshot and incremental backup. Remove
>>> incremental backup after new snapshot.
>>> Take backup to restore on other machine: You can use snapshot after
>>> flushing memtable or Use sstableloader.
>>>
>>>
>>> 
>>> VafaTech.com - A Total Solution for Data Gathering & Analysis
>>>
>>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell 
>>> wrote:
>>>
>>>> for cassandra or datastax's documentation, commitlog's backup is not
>>>> mentioned.
>>>> only snapshot and incremental backup is described to do backup .
>>>>
>>>> Though commitlog's archive for keyspace/table is not support but
>>>> commitlog' replay (though you must put log to commitlog_dir and restart the
>>>> process)
>>>> support the feature of keyspace/table' replay filter (using
>>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
>>>> replay the specified keyspace/table)
>>>>
>>>> Snapshot do affect the storage, for us we got snapshot one week a time
>>>> under the low business peak and making snapshot got throttle ,for you you
>>>> may
>>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>>>
>>>>
>>>>
>>>> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>>>>
>>>>> Thanks Guo and Eric for replying,
>>>>>
>>>>> I have some confusions about commit log backup:
>>>>>
>>>>>1. commit log archival technique is (
>>>>>
>>>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>>>>) as good as an incremental backup, as it also captures commit logs 
>>>>> after
>>>>>memtable flush.
>>>>>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>>>>>have to take commit log from commit log directory (is there any SOP for
>>>>>this?). As commit logs are not per table or ks, we will have chalange 
>>>>> in
>>>>>restoring selective tables.
>>>>>3. Snapshot based backups are easy to manage and operate due to
>>>>>its simplicity. But they are heavy on storage. Any views on this?
>>>>>4. Please share any successful strategy that someone is using for
>>>>>production. We are still in the design phase and want to implement the 
>>>>> best
>>>>>solution.
>>>>>
>>>>> Thanks Eric for sharing link for medusa.
>>>>>
>>>>> Regards,
>>>>> Adarsh Kumar
>>>>>
>>>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
>>>>> wrote:
>>>>>
>>>>>> For me, I think the last one :
>>>>>>  Snapshot + Incremental + commitlog
>>>>>> is the most meaningful way to do backup and restore, when you make
>>>>>> the data backup to some where else like AWS S3.
>>>>>>
>>>>>>- Snapshot based backup // for incremental d

Re: Optimal backup strategy

2019-11-28 Thread guo Maxwell
Hossein is right , But for use , we restore to the same cassandra topology
,So it is usable to do replay .But when restore to the
same machine it is also usable .
Using sstableloader cost too much time and more storage(though will reduce
after  restored)

Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:

> commitlog backup isn't usable in another machine.
> Backup solution depends on what you want to do: periodic backup or backup
> to restore on other machine?
> Periodic backup is combine of snapshot and incremental backup. Remove
> incremental backup after new snapshot.
> Take backup to restore on other machine: You can use snapshot after
> flushing memtable or Use sstableloader.
>
>
> 
> VafaTech.com - A Total Solution for Data Gathering & Analysis
>
> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell  wrote:
>
>> for cassandra or datastax's documentation, commitlog's backup is not
>> mentioned.
>> only snapshot and incremental backup is described to do backup .
>>
>> Though commitlog's archive for keyspace/table is not support but
>> commitlog' replay (though you must put log to commitlog_dir and restart the
>> process)
>> support the feature of keyspace/table' replay filter (using
>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
>> replay the specified keyspace/table)
>>
>> Snapshot do affect the storage, for us we got snapshot one week a time
>> under the low business peak and making snapshot got throttle ,for you you
>> may
>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>
>>
>>
>> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>>
>>> Thanks Guo and Eric for replying,
>>>
>>> I have some confusions about commit log backup:
>>>
>>>1. commit log archival technique is (
>>>
>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>>) as good as an incremental backup, as it also captures commit logs after
>>>memtable flush.
>>>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>>>have to take commit log from commit log directory (is there any SOP for
>>>this?). As commit logs are not per table or ks, we will have chalange in
>>>restoring selective tables.
>>>3. Snapshot based backups are easy to manage and operate due to its
>>>simplicity. But they are heavy on storage. Any views on this?
>>>4. Please share any successful strategy that someone is using for
>>>production. We are still in the design phase and want to implement the 
>>> best
>>>solution.
>>>
>>> Thanks Eric for sharing link for medusa.
>>>
>>> Regards,
>>> Adarsh Kumar
>>>
>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
>>> wrote:
>>>
>>>> For me, I think the last one :
>>>>  Snapshot + Incremental + commitlog
>>>> is the most meaningful way to do backup and restore, when you make the
>>>> data backup to some where else like AWS S3.
>>>>
>>>>- Snapshot based backup // for incremental data will not be
>>>>backuped and may lose data when restore to the time latter than snapshot
>>>>time;
>>>>- Incremental backups // better than snapshot backup .but
>>>>with Insufficient data accuracy. For data remain in the memtable will be
>>>>lose;
>>>>- Snapshot + incremental
>>>>- Snapshot + commitlog archival // better data precision than made
>>>>incremental backup, but the data in the non archived commitlog(not 
>>>> archive
>>>>and commitlog log not closed) will not restore and will lose. Also when 
>>>> log
>>>>is too much, do log reply will cost very mucu time
>>>>
>>>> For me ,We use snapshot + incremental + commitlog archive. We read
>>>> snapshot data and incremental data .Also the log is backuped .But we will
>>>> not backup the
>>>> log whose data have been flush to sstable ,for the data will be
>>>> backuped by the way we do incremental backup .
>>>>
>>>> This way , the data will exist in the format of sstable trough snapshot
>>>> backup and incremental backup . The log number will be very small .And log
>>>> replay will not cost much time.
>>>>
>>>>
>>>>
>>>> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>>>>

Re: Optimal backup strategy

2019-11-27 Thread guo Maxwell
for cassandra or datastax's documentation, commitlog's backup is not
mentioned.
only snapshot and incremental backup is described to do backup .

Though commitlog's archive for keyspace/table is not support but commitlog'
replay (though you must put log to commitlog_dir and restart the process)
support the feature of keyspace/table' replay filter (using
-Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
replay the specified keyspace/table)

Snapshot do affect the storage, for us we got snapshot one week a time
under the low business peak and making snapshot got throttle ,for you you
may
see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)



Adarsh Kumar  于2019年11月28日周四 上午1:00写道:

> Thanks Guo and Eric for replying,
>
> I have some confusions about commit log backup:
>
>1. commit log archival technique is (
>
> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>) as good as an incremental backup, as it also captures commit logs after
>memtable flush.
>2. If we go for "Snapshot + Incremental bk + Commit log", here we have
>to take commit log from commit log directory (is there any SOP for this?).
>As commit logs are not per table or ks, we will have chalange in restoring
>selective tables.
>3. Snapshot based backups are easy to manage and operate due to its
>simplicity. But they are heavy on storage. Any views on this?
>4. Please share any successful strategy that someone is using for
>production. We are still in the design phase and want to implement the best
>solution.
>
> Thanks Eric for sharing link for medusa.
>
> Regards,
> Adarsh Kumar
>
> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell  wrote:
>
>> For me, I think the last one :
>>  Snapshot + Incremental + commitlog
>> is the most meaningful way to do backup and restore, when you make the
>> data backup to some where else like AWS S3.
>>
>>- Snapshot based backup // for incremental data will not be backuped
>>and may lose data when restore to the time latter than snapshot time;
>>- Incremental backups // better than snapshot backup .but
>>with Insufficient data accuracy. For data remain in the memtable will be
>>lose;
>>- Snapshot + incremental
>>- Snapshot + commitlog archival // better data precision than made
>>incremental backup, but the data in the non archived commitlog(not archive
>>and commitlog log not closed) will not restore and will lose. Also when 
>> log
>>is too much, do log reply will cost very mucu time
>>
>> For me ,We use snapshot + incremental + commitlog archive. We read
>> snapshot data and incremental data .Also the log is backuped .But we will
>> not backup the
>> log whose data have been flush to sstable ,for the data will be backuped
>> by the way we do incremental backup .
>>
>> This way , the data will exist in the format of sstable trough snapshot
>> backup and incremental backup . The log number will be very small .And log
>> replay will not cost much time.
>>
>>
>>
>> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>>
>>> Hi,
>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>>>
>>> See :
>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>>>
>>> Hope this link will help you.
>>>
>>> Eric
>>>
>>>
>>> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>>>
>>> Hi,
>>>
>>> I was looking for the backup strategies of Cassandra. After some study I
>>> came to know that there are the following options:
>>>
>>>- Snapshot based backup
>>>- Incremental backups
>>>- Snapshot + incremental
>>>- Snapshot + commitlog archival
>>>- Snapshot + Incremental + commitlog
>>>
>>> Which is the most suitable and feasible approach? Also which of these is
>>> used most.
>>> Please let me know if there is any other option to tool available.
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Adarsh Kumar
>>>
>>>
>>
>> --
>> you are the apple of my eye !
>>
>

-- 
you are the apple of my eye !


Re: Optimal backup strategy

2019-11-27 Thread guo Maxwell
For me, I think the last one :
 Snapshot + Incremental + commitlog
is the most meaningful way to do backup and restore, when you make the data
backup to some where else like AWS S3.

   - Snapshot based backup // for incremental data will not be backuped and
   may lose data when restore to the time latter than snapshot time;
   - Incremental backups // better than snapshot backup .but
   with Insufficient data accuracy. For data remain in the memtable will be
   lose;
   - Snapshot + incremental
   - Snapshot + commitlog archival // better data precision than made
   incremental backup, but the data in the non archived commitlog(not archive
   and commitlog log not closed) will not restore and will lose. Also when log
   is too much, do log reply will cost very mucu time

For me ,We use snapshot + incremental + commitlog archive. We read snapshot
data and incremental data .Also the log is backuped .But we will not backup
the
log whose data have been flush to sstable ,for the data will be backuped by
the way we do incremental backup .

This way , the data will exist in the format of sstable trough snapshot
backup and incremental backup . The log number will be very small .And log
replay will not cost much time.



Eric LELEU  于2019年11月27日周三 下午4:13写道:

> Hi,
> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>
> See :
> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>
> Hope this link will help you.
>
> Eric
>
>
> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>
> Hi,
>
> I was looking for the backup strategies of Cassandra. After some study I
> came to know that there are the following options:
>
>- Snapshot based backup
>- Incremental backups
>- Snapshot + incremental
>- Snapshot + commitlog archival
>- Snapshot + Incremental + commitlog
>
> Which is the most suitable and feasible approach? Also which of these is
> used most.
> Please let me know if there is any other option to tool available.
>
> Thanks in advance.
>
> Regards,
> Adarsh Kumar
>
>

-- 
you are the apple of my eye !


Re: Curiosity in adding nodes

2019-10-21 Thread guo Maxwell
1.the node added to the ring will calculate the token range it owns, then
get the data of the range from the nodes originally owned the data.
2.then the streamed sstable and the range of the sstable should be
estimated.
3.then streaming begins .secondary index will be build afther sstabte
streamed successfully.
4.when all data is transferred ,the node's status will change from joining
to normal .and the node status will be insert to system keyspace.
5.during the time the data is streaming . added node can be write data but
no select.

Eunsu Kim  于2019年10月22日周二 上午9:54写道:

> Hi experts,
>
> When a new node was added, how can the coordinator find data that has been
> not yet streamed?
>
> Or is new nodes not used until all data is streamed?
>
> Thanks in advance
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 
you are the apple of my eye !


Re: Nodetool snapshot

2019-09-19 Thread guo Maxwell
yes you need to restore snapshot of the node’s own snapshot for every node

Abdul Patel 于2019年9月20日 周五上午2:08写道:

> Thanks , i guess i have both.
> So can we have either or?
> If i keep auto_snapshot? Can i remove nodetool snapshot?
> Woest case scenario if i wish to restore snapshot which one will be best
> option?
> Also if we restore snapshot do we need to have snapshot on all nodes?
>
>
> On Thursday, September 19, 2019, Jeff Jirsa  wrote:
>
>> You probably have auto_snapshot enabled, which takes snapshots when you
>> do certain things. You can disable that if you dont need it, but it
>> protects you against things like accidentally dropping / truncating a table.
>>
>> You may also be doing snapshots manually - if you do this, you can
>> 'nodetool clearsnapshot' to free up space.
>>
>>
>> On Thu, Sep 19, 2019 at 10:54 AM Abdul Patel  wrote:
>>
>>> Hey All,
>>>
>>> I found recentmy that the nodetool snapshot golder is creating almost
>>> 120GB of filea when my actual keyspace folder has 20GB only.
>>> Do we need to change any paramater to avoid this?
>>> Is this normal?
>>> I have 3.11.4 version
>>>
>> --
you are the apple of my eye !


Re: Is it possible to build multi cloud cluster for Cassandra

2019-09-05 Thread guo Maxwell
you can build cassandra under multi cloud environment ,but there network
can be connect with each other.☺

Goutham reddy  于2019年9月6日周五 上午12:36写道:

> Hello,
> Is it wise and advisable to build multi cloud environment for Cassandra
> for High Availability.
> AWS as one datacenter and Azure as another datacenter.
> If yes are there any challenges involved?
>
> Thanks and regards,
> Goutham.
>


-- 
you are the apple of my eye !


Re: Reminder: ApacheCon NA next week

2019-09-05 Thread guo Maxwell
thank you very much!

Jeff Jirsa 于2019年9月5日 周四下午11:16写道:

>
> ApacheCon NA 2019 is next week in Las Vegas. There’s a Cassandra track,
> with 3 days of just-about-cassandra talks. If you haven’t signed up, it’s
> not too late (but travel / hotels get harder as time gets short):
>
> Register here: https://www.apachecon.com/acna19/register.html
> Schedule here: https://www.apachecon.com/acna19/s/#/schedule
>
>
> --
you are the apple of my eye !


Re: about remaining data after adding a node

2019-09-05 Thread guo Maxwell
his data got ttl,so just wait if he do not want do cleanup

Oleksandr Shulgin 于2019年9月5日 周四下午5:48写道:

> On Thu, Sep 5, 2019 at 11:19 AM Federico Razzoli <
> federico.razzoli@gmail.com> wrote:
>
>>
>> Are you using DateTieredCompactionStrategy? It optimises the deletion of
>> expired data from disks.
>> If minor compactions are not solving the problem, I suggest to run
>> nodetool compact.
>>
>
> Sorry, but both of the suggestions above are of dubious quality IMO.
>
> Don't use DTCS, use TWCS instead, if your use case is a good fit
> (immutable, in-order, TTLd inserts).
> Don't trigger major compaction for w/o a very good reason.
>
> In general, compaction can help to get rid of expired data, but it doesn't
> remove copies of data for which the node is not responsible anymore due to
> topology change.  Use the cleanup command for that.
>
> --
> Alex
>
> --
you are the apple of my eye !