Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-11 Thread Joy Gao
Re Rahul:  "Although DSE advanced replication does one way, those are use
cases with limited value to me because ultimately it’s still a master slave
design."
Completely agree. I'm not familiar with Calvin protocol, but that sounds
interesting (reading time...).

On Tue, Sep 11, 2018 at 8:38 PM Joy Gao  wrote:

> Thank you all for the feedback so far.
>
> The immediate use case for us is setting up a real-time streaming data
> pipeline from C* to our Data Warehouse (BigQuery), where other teams can
> access the data for reporting/analytics/ad-hoc query. We already do this
> with MySQL
> ,
> where we stream the MySQL Binlog via Debezium 's
> MySQL Connector to Kafka, and then use a BigQuery Sink Connector to stream
> data to BigQuery.
>
> Re Jon's comment about why not write to Kafka first? In some cases that
> may be ideal; but one potential concern we have with writing to Kafka first
> is not having "read-after-write" consistency. The data could be written to
> Kafka, but not yet consumed by C*. If the web service issues a (quorum)
> read immediately after the (quorum) write, the data that is being returned
> could still be outdated if the consumer did not catch up. Having web
> service interacts with C* directly solves this problem for us (we could add
> a cache before writing to Kafka, but that adds additional operational
> complexity to the architecture; alternatively, we could write to Kafka and
> C* transactionally, but distributed transaction is slow).
>
> Having the ability to stream its data to other systems could make C* more
> flexible and more easily integrated into a larger data ecosystem. As Dinesh
> has mentioned, implementing this in the database layer means there is a
> standard approach to getting a change notification stream (unlike trigger
> which is ad-hoc and customized). Aside from replication, the change events
> could be used for updating Elasticsearch, generating derived views (i.e.
> for reporting), sending to an audit services, sending to a notification
> service, and in our case, streaming to our data warehouse for analytics.
> (one article that goes over database streaming is Martin Kleppman's Turning
> the Database Inside Out with Apache Samza
> ,
> which seems relevant here). For reference, this turning database into a
> stream of change events is pretty common in SQL databases (i.e. mysql
> binlog, postgres WAL) and NoSQL databases that have primary-replica setup
> (i.e. Mongodb Oplog). Recently CockroachDB introduced a CDC feature as well
> (and they have master-less replication too).
>
> Hope that answers the question. That said, dedupe/ordering/getting full
> row of data via C* CDC is a hard problem, but may be worth solving for
> reasons mentioned above. Our proposal is an user approach to solve these
> problems. Maybe the more sensible thing to do is to build it as part of C*
> itself, but that's a much bigger discussion. If anyone is building a
> streaming pipeline for C*, we'd be interested in hearing their approaches
> as well.
>
>
> On Tue, Sep 11, 2018 at 7:01 AM Rahul Singh 
> wrote:
>
>> You know what they say: Go big or go home.
>>
>> Right now candidates are Cassandra itself but embedded or on the side not
>> on the actual data clusters, zookeeper (yuck) , Kafka (which needs
>> zookeeper, yuck) , S3 (outside service dependency, so no go. )
>>
>> Jeff, Those are great patterns. ESP. Second one. Have used it several
>> times. Cassandra is a great place to store data in transport.
>>
>>
>> Rahul
>> On Sep 10, 2018, 5:21 PM -0400, DuyHai Doan ,
>> wrote:
>>
>> Also using Calvin means having to implement a distributed monotonic
>> sequence as a primitive, not trivial at all ...
>>
>> On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh <
>> rahul.xavier.si...@gmail.com> wrote:
>>
>>> In response to mimicking Advanced replication in DSE. I understand the
>>> goal. Although DSE advanced replication does one way, those are use cases
>>> with limited value to me because ultimately it’s still a master slave
>>> design.
>>>
>>> I’m working on a prototype for this for two way replication between
>>> clusters or databases regardless of dB tech - and every variation I can get
>>> to comes down to some implementation of the Calvin protocol which basically
>>> verifies the change in either cluster , sequences it according to impact to
>>> underlying data, and then schedules the mutation in a predictable manner on
>>> both clusters / DBS.
>>>
>>> All that means is that I need to sequence the change before it happens
>>> so I can predictably ensure it’s Scheduled for write / Mutation. So I’m
>>> Back to square one: having a definitive queue / ledger separate from the
>>> individual commit log of the cluster.
>>>
>>>
>>> Rahul Singh
>>> Chief Executive Officer
>>> m 202.905.2818
>>>
>>> Anant 

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-11 Thread Joy Gao
Thank you all for the feedback so far.

The immediate use case for us is setting up a real-time streaming data
pipeline from C* to our Data Warehouse (BigQuery), where other teams can
access the data for reporting/analytics/ad-hoc query. We already do this
with MySQL
,
where we stream the MySQL Binlog via Debezium 's MySQL
Connector to Kafka, and then use a BigQuery Sink Connector to stream data
to BigQuery.

Re Jon's comment about why not write to Kafka first? In some cases that may
be ideal; but one potential concern we have with writing to Kafka first is
not having "read-after-write" consistency. The data could be written to
Kafka, but not yet consumed by C*. If the web service issues a (quorum)
read immediately after the (quorum) write, the data that is being returned
could still be outdated if the consumer did not catch up. Having web
service interacts with C* directly solves this problem for us (we could add
a cache before writing to Kafka, but that adds additional operational
complexity to the architecture; alternatively, we could write to Kafka and
C* transactionally, but distributed transaction is slow).

Having the ability to stream its data to other systems could make C* more
flexible and more easily integrated into a larger data ecosystem. As Dinesh
has mentioned, implementing this in the database layer means there is a
standard approach to getting a change notification stream (unlike trigger
which is ad-hoc and customized). Aside from replication, the change events
could be used for updating Elasticsearch, generating derived views (i.e.
for reporting), sending to an audit services, sending to a notification
service, and in our case, streaming to our data warehouse for analytics.
(one article that goes over database streaming is Martin Kleppman's Turning
the Database Inside Out with Apache Samza
,
which seems relevant here). For reference, this turning database into a
stream of change events is pretty common in SQL databases (i.e. mysql
binlog, postgres WAL) and NoSQL databases that have primary-replica setup
(i.e. Mongodb Oplog). Recently CockroachDB introduced a CDC feature as well
(and they have master-less replication too).

Hope that answers the question. That said, dedupe/ordering/getting full row
of data via C* CDC is a hard problem, but may be worth solving for reasons
mentioned above. Our proposal is an user approach to solve these problems.
Maybe the more sensible thing to do is to build it as part of C* itself,
but that's a much bigger discussion. If anyone is building a streaming
pipeline for C*, we'd be interested in hearing their approaches as well.


On Tue, Sep 11, 2018 at 7:01 AM Rahul Singh 
wrote:

> You know what they say: Go big or go home.
>
> Right now candidates are Cassandra itself but embedded or on the side not
> on the actual data clusters, zookeeper (yuck) , Kafka (which needs
> zookeeper, yuck) , S3 (outside service dependency, so no go. )
>
> Jeff, Those are great patterns. ESP. Second one. Have used it several
> times. Cassandra is a great place to store data in transport.
>
>
> Rahul
> On Sep 10, 2018, 5:21 PM -0400, DuyHai Doan , wrote:
>
> Also using Calvin means having to implement a distributed monotonic
> sequence as a primitive, not trivial at all ...
>
> On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh  > wrote:
>
>> In response to mimicking Advanced replication in DSE. I understand the
>> goal. Although DSE advanced replication does one way, those are use cases
>> with limited value to me because ultimately it’s still a master slave
>> design.
>>
>> I’m working on a prototype for this for two way replication between
>> clusters or databases regardless of dB tech - and every variation I can get
>> to comes down to some implementation of the Calvin protocol which basically
>> verifies the change in either cluster , sequences it according to impact to
>> underlying data, and then schedules the mutation in a predictable manner on
>> both clusters / DBS.
>>
>> All that means is that I need to sequence the change before it happens so
>> I can predictably ensure it’s Scheduled for write / Mutation. So I’m
>> Back to square one: having a definitive queue / ledger separate from the
>> individual commit log of the cluster.
>>
>>
>> Rahul Singh
>> Chief Executive Officer
>> m 202.905.2818
>>
>> Anant Corporation
>> 1010 Wisconsin Ave NW, Suite 250
>> 
>> Washington, D.C. 20007
>>
>> We build and manage digital business technology platforms.
>> On Sep 10, 2018, 3:58 AM -0400, Dinesh Joshi 
>> ,
>> wrote:
>>
>> On Sep 9, 2018, at 6:08 AM, Jonathan Haddad  wrote:
>>
>> There may be some use cases for it.. but I'm not sure what they are.  It
>> might help if you shared the use cases 

Re: Scrub a single SSTable only?

2018-09-11 Thread Jeff Jirsa
Doing this can resurrect deleted data and violate consistency - if that’s a 
problem for you, it may be easier to treat the whole host as failed, run 
repairs and replace it.

-- 
Jeff Jirsa


> On Sep 11, 2018, at 2:41 PM, Rahul Singh  wrote:
> 
> What’s the RF for that data ? If you can manage downtime one node I’d 
> recommend just bringing it down, and then repairing after you delete the bad 
> file and bring it back up.
> 
> Rahul Singh
> Chief Executive Officer
> m 202.905.2818
> 
> Anant Corporation
> 1010 Wisconsin Ave NW, Suite 250
> Washington, D.C. 20007
> 
> We build and manage digital business technology platforms.
>> On Sep 11, 2018, 2:55 AM -0400, Steinmaurer, Thomas 
>> , wrote:
>> Hello,
>> 
>>  
>> 
>> is there a way to Online scrub a particular SSTable file only and not the 
>> entire column family?
>> 
>>  
>> 
>> According to the Cassandra logs we have a corrupted SSTable smallish 
>> compared to the entire data volume of the column family in question.
>> 
>>  
>> 
>> To my understanding, both, nodetool scrub and sstablescrub operate on the 
>> entire column family and can’t work on a single SSTable, right?
>> 
>>  
>> 
>> There is still the way to shutdown Cassandra and remove the file from disk, 
>> but ideally I want to have that as an online operation.
>> 
>>  
>> 
>> Perhaps there is something JMX based?
>> 
>>  
>> 
>> Thanks,
>> 
>> Thomas
>> 
>>  
>> 
>> The contents of this e-mail are intended for the named addressee only. It 
>> contains information that may be confidential. Unless you are the named 
>> addressee or an authorized designee, you may not copy or use it, or disclose 
>> it to anyone else. If you received it in error please notify us immediately 
>> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
>> is a company registered in Linz whose registered office is at 4040 Linz, 
>> Austria, Freistädterstraße 313


Re: Scrub a single SSTable only?

2018-09-11 Thread Rahul Singh
What’s the RF for that data ? If you can manage downtime one node I’d recommend 
just bringing it down, and then repairing after you delete the bad file and 
bring it back up.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Sep 11, 2018, 2:55 AM -0400, Steinmaurer, Thomas 
, wrote:
> Hello,
>
> is there a way to Online scrub a particular SSTable file only and not the 
> entire column family?
>
> According to the Cassandra logs we have a corrupted SSTable smallish compared 
> to the entire data volume of the column family in question.
>
> To my understanding, both, nodetool scrub and sstablescrub operate on the 
> entire column family and can’t work on a single SSTable, right?
>
> There is still the way to shutdown Cassandra and remove the file from disk, 
> but ideally I want to have that as an online operation.
>
> Perhaps there is something JMX based?
>
> Thanks,
> Thomas
>
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313


Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Oleksandr Shulgin
On Tue, 11 Sep 2018, 19:26 Jeff Jirsa,  wrote:

> Repair or read-repair
>

Jeff,

Could you be more specific please?

Why any data would be streamed in if there is no (as far as I can see)
possibilities for the nodes to have inconsistency?

--
Alex

On Tue, Sep 11, 2018 at 12:58 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
>>> thomas.steinmau...@dynatrace.com> wrote:
>>>
 As far as I remember, in newer Cassandra versions, with STCS, nodetool
 compact offers a ‘-s’ command-line option to split the output into files
 with 50%, 25% … in size, thus in this case, not a single largish SSTable
 anymore. By default, without -s, it is a single SSTable though.

>>>
>>> Thanks Thomas, I've also spotted the option while testing this
>>> approach.  I understand that doing major compactions is generally not
>>> recommended, but do you see any real drawback of having a single SSTable
>>> file in case we stopped writing new data to the table?
>>>
>>
>> A related question is: given that we are not writing new data to these
>> tables, it would make sense to exclude them from the routine repair
>> regardless of the option we use in the end to remove the tombstones.
>>
>> However, I've just checked the timestamps of the SSTable files on one of
>> the nodes and to my surprise I can find some files written only a few weeks
>> ago (most of the files are half a year ago, which is expected because it
>> was the time we were adding this DC).  But we've stopped writing to the
>> tables about a year ago and we repair the cluster very week.
>>
>> What could explain that we suddenly see these new SSTable files?  They
>> shouldn't be there even due to overstreaming, because one would need to
>> find some differences in the Merkle tree in the first place, but I don't
>> see how that could actually happen in our case.
>>
>> Any ideas?
>>
>> Thanks,
>> --
>> Alex
>>
>>


Re: impact/incompatibility of patch backport on Cassandra 3.11.2

2018-09-11 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-14672 is almost certainly
due to pre-existing corruption . That the user is seeing 14672 is due to
extra guards added in 3.11.3, but 14672 isn't likely going to hit you
unless you're subject to
https://issues.apache.org/jira/browse/CASSANDRA-14515 , which is a much
more important bug (that is: 3.11.2 has a data loss bug, 3.11.3 just breaks
the read)

On Tue, Sep 11, 2018 at 6:21 AM Ahmed Eljami  wrote:

> Any opinion  please ?
>
> Le jeu. 6 sept. 2018 à 22:18, Ahmed Eljami  a
> écrit :
>
>> Hi,
>>
>> We are testing Cassandra 3.11.2  and we sawed that it contains a critcal
>> bug wich was fixed in 3.11.3 (
>> https://issues.apache.org/jira/browse/CASSANDRA-13929).
>>
>> After about 1 months of testing, we haven't encountered this bug in our
>> environnement, but to be sure before going in production, we would like to
>> know if it's possible to backport this patch in version 3.11.2 ?
>>
>> Do you think that this patch could be backported wihtout
>> impact/incompatibility on 3.11.2 ? Or it be  safer to migrate to Cassandra
>> 3.11.3 ?
>>
>> Our fears that 3.11.3 also not stable due to this bug:
>> https://issues.apache.org/jira/browse/CASSANDRA-14672
>>
>> Best regards.
>>
>>
>
> --
> Cordialement;
>
> Ahmed ELJAMI
>


Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Jeff Jirsa
Repair or read-repair


On Tue, Sep 11, 2018 at 12:58 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
>> thomas.steinmau...@dynatrace.com> wrote:
>>
>>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>>> compact offers a ‘-s’ command-line option to split the output into files
>>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>>> anymore. By default, without -s, it is a single SSTable though.
>>>
>>
>> Thanks Thomas, I've also spotted the option while testing this approach.
>> I understand that doing major compactions is generally not recommended, but
>> do you see any real drawback of having a single SSTable file in case we
>> stopped writing new data to the table?
>>
>
> A related question is: given that we are not writing new data to these
> tables, it would make sense to exclude them from the routine repair
> regardless of the option we use in the end to remove the tombstones.
>
> However, I've just checked the timestamps of the SSTable files on one of
> the nodes and to my surprise I can find some files written only a few weeks
> ago (most of the files are half a year ago, which is expected because it
> was the time we were adding this DC).  But we've stopped writing to the
> tables about a year ago and we repair the cluster very week.
>
> What could explain that we suddenly see these new SSTable files?  They
> shouldn't be there even due to overstreaming, because one would need to
> find some differences in the Merkle tree in the first place, but I don't
> see how that could actually happen in our case.
>
> Any ideas?
>
> Thanks,
> --
> Alex
>
>


Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-11 Thread Elliott Sims
A few reasons I can think of offhand why your test setup might not see
problems from large readahead:
Your sstables are <4MB or your reads are typically <4MB from the end of the
file
Your queries tend to use the 4MB of data anyways
Your dataset is small enough that most of it fits in the VM cache, and it
rarely goes to disk
Load is low enough that the read I/O amplification doesn't hurt performance
Less likely but still possible is that there's a subtle difference in the
way that 2.1 does reads vs 3.x that's affecting it.  The less subtle
explanation is that 3.x has smaller rows and a smaller readahead is
therefore probably optimal, but that would only decrease your performance
benefit and not cause a regression from 2.1->3.x.


On Mon, Sep 10, 2018 at 1:27 AM, Laxmikant Upadhyay  wrote:

> Thank you so much Alexander !
>
> Your doubt was right. It was due to the very high value of readahead only
> (4 mb).
>
> Although We had set readahead value to 8kb in our /etc/rc.local but some
> how this was not working.
> we are keeping the value to 64 kb as we this is giving better performance
> than 8kb. Now we are able to meet our sla.
>
> One interesting observation is that we have a setup on cassandra 2.1.16
> also and on that system the readahead value is 4mb only but we are not
> observing any performance dip there. I am not sure why.
>
>
> On Wed, Sep 5, 2018 at 11:31 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Don't forget to run "nodetool upgradesstables -a" after you ran the ALTER
>> statement so that all SSTables get re-written with the new compression
>> settings.
>>
>> Since you have a lot of tables in your cluster, be aware that lowering
>> the chunk length will grow the offheap memory usage of Cassandra.
>> You can get more informations here : http://thelastpickle.com/
>> blog/2018/08/08/compression_performance.html
>>
>> You should also check your readahead settings as it may be set too high :
>> sudo blockdev --report
>> The default is usually 256 but Cassandra would rather favor low readahead
>> values to get more IOPS instead of more throughput (and readahead is
>> usually not that useful for Cassandra). A conservative setting is 64 (you
>> can go down to 8 and see how Cassandra performs then).
>> Do note that changing the readahead settings requires to restart
>> Cassandra as it is only read once by the JVM during startup.
>>
>> Cheers,
>>
>> On Wed, Sep 5, 2018 at 7:27 AM CPC  wrote:
>>
>>> Could you decrease chunk_length_in_kb to 16 or 8 and repeat the test.
>>>
>>> On Wed, Sep 5, 2018, 5:51 AM wxn...@zjqunshuo.com 
>>> wrote:
>>>
 How large is your row? You may meet reading wide row problem.

 -Simon

 *From:* Laxmikant Upadhyay 
 *Date:* 2018-09-05 01:01
 *To:* user 
 *Subject:* High IO and poor read performance on 3.11.2 cassandra
 cluster

 We have 3 node cassandra cluster (3.11.2) in single dc.

 We have written 450 million records on the table with LCS. The write
 latency is fine.  After write we perform read and update operations.

 When we run read+update operations on newly inserted 1 million records
 (on top of 450 m records) then the read latency and io usage is under
 control. However when we perform read+update on old 1 million records which
 are part of 450 million records we observe high read latency (The
 performance goes down by 4 times in comparison 1st case ).  We have not
 observed major gc pauses.

 *system information:*
 *cpu core :*  24
 *disc type : *ssd . we are using raid with deadline schedular
 *disk space:*
 df -h :
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sdb11.9T  393G  1.5T  22%
 /var/lib/cassandra
 *memory:*
 free -g
   totalusedfree  shared  buff/cache
  available
 Mem: 62  30   0   0  32
   31
 Swap: 8   0   8

 ==

 *schema*

 desc table ks.xyz;

 CREATE TABLE ks.xyz (
 key text,
 column1 text,
 value text,
 PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE
 AND CLUSTERING ORDER BY (column1 ASC)
 AND bloom_filter_fp_chance = 0.1
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = ''
 AND compaction = {'class': 'org.apache.cassandra.db.compaction.
 LeveledCompactionStrategy'}
 AND compression = {'chunk_length_in_kb': '64', 'class': '
 org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.0
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND 

Speakers needed for Apache DC Roadshow

2018-09-11 Thread Rich Bowen
We need your help to make the Apache Washington DC Roadshow on Dec 4th a 
success.


What do we need most? Speakers!

We're bringing a unique DC flavor to this event by mixing Open Source 
Software with talks about Apache projects as well as OSS CyberSecurity, 
OSS in Government and and OSS Career advice.


Please take a look at: http://www.apachecon.com/usroadshow18/

(Note: You are receiving this message because you are subscribed to one 
or more mailing lists at The Apache Software Foundation.)


Rich, for the ApacheCon Planners

--
rbo...@apache.org
http://apachecon.com
@ApacheCon

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-11 Thread Rahul Singh
You know what they say: Go big or go home.

Right now candidates are Cassandra itself but embedded or on the side not on 
the actual data clusters, zookeeper (yuck) , Kafka (which needs zookeeper, 
yuck) , S3 (outside service dependency, so no go. )

Jeff, Those are great patterns. ESP. Second one. Have used it several times. 
Cassandra is a great place to store data in transport.


Rahul
On Sep 10, 2018, 5:21 PM -0400, DuyHai Doan , wrote:
> Also using Calvin means having to implement a distributed monotonic sequence 
> as a primitive, not trivial at all ...
>
> > On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh  
> > wrote:
> > > In response to mimicking Advanced replication in DSE. I understand the 
> > > goal. Although DSE advanced replication does one way, those are use cases 
> > > with limited value to me because ultimately it’s still a master slave 
> > > design.
> > >
> > > I’m working on a prototype for this for two way replication between 
> > > clusters or databases regardless of dB tech - and every variation I can 
> > > get to comes down to some implementation of the Calvin protocol which 
> > > basically verifies the change in either cluster , sequences it according 
> > > to impact to underlying data, and then schedules the mutation in a 
> > > predictable manner on both clusters / DBS.
> > >
> > > All that means is that I need to sequence the change before it happens so 
> > > I can predictably ensure it’s Scheduled for write / Mutation. So I’m
> > > Back to square one: having a definitive queue / ledger separate from the 
> > > individual commit log of the cluster.
> > >
> > >
> > > Rahul Singh
> > > Chief Executive Officer
> > > m 202.905.2818
> > >
> > > Anant Corporation
> > > 1010 Wisconsin Ave NW, Suite 250
> > > Washington, D.C. 20007
> > >
> > > We build and manage digital business technology platforms.
> > > On Sep 10, 2018, 3:58 AM -0400, Dinesh Joshi 
> > > , wrote:
> > > > > On Sep 9, 2018, at 6:08 AM, Jonathan Haddad  
> > > > > wrote:
> > > > >
> > > > > There may be some use cases for it.. but I'm not sure what they are.  
> > > > > It might help if you shared the use cases where the extra complexity 
> > > > > is required?  When does writing to Cassandra which then dedupes and 
> > > > > writes to Kafka a preferred design then using Kafka and simply 
> > > > > writing to Cassandra?
> > > >
> > > > From the reading of the proposal, it seems bring functionality similar 
> > > > to MySQL's binlog to Kafka connector. This is useful for many 
> > > > applications that want to be notified when certain (or any) rows change 
> > > > in the database primarily for a event driven application architecture.
> > > >
> > > > Implementing this in the database layer means there is a standard 
> > > > approach to getting a change notification stream. Downstream 
> > > > subscribers can then decide which notifications to act on.
> > > >
> > > > LinkedIn's databus is similar in functionality - 
> > > > https://github.com/linkedin/databus However it is for heterogenous 
> > > > datastores.
> > > >
> > > > > > On Thu, Sep 6, 2018 at 1:53 PM Joy Gao  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > We have a WIP design doc that goes over this idea in details.
> > > > > > >
> > > > > > > We haven't sort out all the edge cases yet, but would love to get 
> > > > > > > some feedback from the community on the general feasibility of 
> > > > > > > this approach. Any ideas/concerns/questions would be helpful to 
> > > > > > > us. Thanks!
> > > > > > >
> > > >
> > > > Interesting idea. I did go over the proposal briefly. I concur with Jon 
> > > > about adding more use-cases to clarify this feature's potential 
> > > > use-cases.
> > > >
> > > > Dinesh
>


Re: impact/incompatibility of patch backport on Cassandra 3.11.2

2018-09-11 Thread Ahmed Eljami
Any opinion  please ?

Le jeu. 6 sept. 2018 à 22:18, Ahmed Eljami  a
écrit :

> Hi,
>
> We are testing Cassandra 3.11.2  and we sawed that it contains a critcal
> bug wich was fixed in 3.11.3 (
> https://issues.apache.org/jira/browse/CASSANDRA-13929).
>
> After about 1 months of testing, we haven't encountered this bug in our
> environnement, but to be sure before going in production, we would like to
> know if it's possible to backport this patch in version 3.11.2 ?
>
> Do you think that this patch could be backported wihtout
> impact/incompatibility on 3.11.2 ? Or it be  safer to migrate to Cassandra
> 3.11.3 ?
>
> Our fears that 3.11.3 also not stable due to this bug:
> https://issues.apache.org/jira/browse/CASSANDRA-14672
>
> Best regards.
>
>

-- 
Cordialement;

Ahmed ELJAMI


Re: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Oleksandr Shulgin
On Tue, Sep 11, 2018 at 10:04 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

>
> Yet another surprising aspect of using `nodetool compact` is that it
> triggers major compaction on *all* nodes in the cluster at the same time.
> I don't see where this is documented and this was contrary to my
> expectation.  Does this behavior make sense to anyone?  Is this a bug?  The
> version is 3.0.
>

Whoops, taking back this one.  It was me who triggered the compaction on
all nodes at the same time.  Trying to do too many things at the same time.
:(

--
Alex


Re: Default Single DataCenter -> Multi DataCenter

2018-09-11 Thread Eunsu Kim
It’s self respond.

Step3 is wrong.

Even if it was a SimpleSnitch, changing the dc information will not start 
CassandraDaemon with the error log.

ERROR [main] 2018-09-11 18:36:30,272 CassandraDaemon.java:708 - Cannot start 
node if snitch's data center (pg1) differs from previous data center 
(datacenter1). Please fix the snitch configuration, decommission and 
rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.


> On 11 Sep 2018, at 2:25 PM, Eunsu Kim  wrote:
> 
> Hello
> 
> Thank you for your responses.
> 
> I’ll share my adding datacenter plan. If you see problems, please respond.
> 
> The sentence may be a little awkward because I am so poor at English that I 
> am being helped by a translator.
> 
> I've been most frequently referred to.(https://medium.com/p/465e9bf28d99 
> ) Thank you for your cleanliness. (Pradeep 
> Chhetri)
> 
> I will also upgrade the version as Alain Rodriguez's advice.
> 
> 
> 
> Step 1. Stop all existing clusters. (My service is paused.)
> 
> Step 2. Install Cassandra 3.11.3 and copy existing conf files.
> 
> Step 3. Modify cassandra-rackdc.properties for existing nodes. dc=mydc1 
> rack=myrack1
>  Q. I think this modification will not affect the existing data because 
> it was SimpleSnitch before, right?
> 
> Step 4. In the caseandra.yaml of existing nodes, endpoint_snitch is changed 
> to GossippingPropertyFileSnitch.
> 
> Step 5. Restart the Cassandra of the existing nodes. (My service is resumed.)
> 
> Step 6. Change the settings of all existing clients to DCAwareRobinPolicy and 
> refer to mydc1. Consistency level is LOCAL_ONE. And rolling restart.
>   Q. Isn't it a problem that at this point, DCAwareRobinPolicy and 
> RoundRobinPolicy coexist?
> 
> Step 7. Alter my keyspace and system keyspace(system_distributed, 
> system_traces) :  SimpleStrategy(RF=2) -> { 'class' : 
> 'NetworkTopologyStrategy', ‘mydc1’ : 2 }
> 
> Step 8. Install Cassandra in a new cluster, copying existing conf files, and 
> setting it to Cassandra-racdc.properties. dc=mydc2 rack=myrack2
> 
> Step 9. Adding a new seed node to the cassandra.yaml of the existing cluster 
> (mydc1) and restart.
>   Q1. Must I add the new seed nodes in five all existing nodes?
>   Q2. Don't I need to update the seed node settings of the new cluster 
> (mydc2)?
> 
> Step 10. Alter my keyspace and system keyspace(system_distributed, 
> system_traces) :  { 'class' : 'NetworkTopologyStrategy', ‘mydc1’ : 1, ‘mydc2’ 
> : 1 }
> 
> Step 11. Run 'nodetool rebuild — mydc1’ in turn, in the new node.
> 
> ———
> 
> 
> I'll run the procedure on the development envrionment and share it.
> 
> Thank you.
> 
> 
> 
> 
>> On 10 Sep 2018, at 10:26 PM, Pradeep Chhetri > > wrote:
>> 
>> Hello Eunsu, 
>> 
>> I am going through the same exercise at my job. I was making notes as i was 
>> testing the steps in my preproduction environment. Although I haven't tested 
>> end to end but hopefully this might help you: 
>> https://medium.com/p/465e9bf28d99 
>> 
>> Regards,
>> Pradeep
>> 
>> On Mon, Sep 10, 2018 at 5:59 PM, Alain RODRIGUEZ > > wrote:
>> Adding a data center for the first time is a bit tricky when you haven't 
>> been considering it from the start.
>> 
>> I operate 5 nodes cluster (3.11.0) in a single data center with 
>> SimpleSnitch, SimpleStrategy and all client policy RoundRobin.
>> 
>> You will need:
>> 
>> - To change clients, make them 'DCAware'. This depends on the client, but 
>> you should be able to find this in your Cassandra driver (client side).
>> - To change clients, make them use 'LOCAL_' consistency 
>> ('LOCAL_ONE'/'LOCAL_QUORUM' being the most common).
>> - To change 'SimpleSnitch' for 'EC2Snitch' or 'GossipingPropertyFileSnitch' 
>> for example, depending on your context/preference
>> - To change 'SimpleStrategy' for 'NetworkTopologyStrategy' for all the 
>> keyspaces, with the desired RF. I take the chance to say that switching to 1 
>> replica only is often a mistake, you can indeed have data loss (which you 
>> accept) but also service going down, anytime you restart a node or that a 
>> node goes down. If you are ok with RF=1, RDBMS might be a better choice. 
>> It's an anti-pattern of some kind to run Cassandra with RF=1. Yet up to you, 
>> this is not our topic :). In the same kind of off-topic recommendations, I 
>> would not stick with C*3.11.0, but go to C*3.11.3 (if you do not perform 
>> slice delete, there is still a bug with this apparently)
>> 
>> So this all needs to be done before starting adding the new data center. 
>> Changing the snitch is tricky, make sure that the new snitch uses the racks 
>> and dc names currently in use in your cluster for the current cluster, if 
>> not the data could not be accessible after the configuration change.
>> 
>> Then the procedure to add a data center is probably described around. I know 
>> I did 

Re: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Oleksandr Shulgin
On Tue, Sep 11, 2018 at 11:07 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

>
> a single (largish) SSTable or any other SSTable for a table, which does
> not get any writes (with e.g. deletes) anymore, will most likely not be
> part of an automatic minor compaction anymore, thus may stay forever on
> disk, if I don’t miss anything crucial here.
>

I would also expect that, but that's totally fine for us.


> Might be different though, if you are entirely writing TTL-based, cause
> single SSTable based automatic tombstone compaction may kick in here, but
> I’m not really experienced with that.
>

Yes, we were writing with a TTL of 2 years to these tables, and in about 1
years from now 100% of the data in them will expire.  We would be able to
simply truncate them at that point.

Now that you mention single-SSTable tombstone compaction again, I don't
think this is happening in our case.  For example, on one of the nodes I
see estimated droppable tombstones ratio range from 0.24 to slightly over 1
(1.09).  Yet, no single-SSTable compaction was triggered apparently,
because the data files are all 6 months old now.  We are using all the
default settings for tombstone_threshold, tombstone_compaction_interval
and unchecked_tombstone_compaction.

Does this mean that these all SSTable files do indeed overlap and because
we don't allow unchecked_tombstone_compaction, no actual compaction is
triggered?

We had been suffering a lot with storing timeseries data with STCS and disk
> capacity to have the cluster working smoothly and automatic minor
> compactions kicking out aged timeseries data according to our retention
> policies in the business logic. TWCS is unfortunately not an option for us.
> So, we did run major compactions every X weeks to reclaim disk space, thus
> from an operational perspective, by far not nice. Thus, finally decided to
> change STCS min_threshold from default 4 to 2, to let minor compactions
> kick in more frequently. We can live with the additional IO/CPU this is
> causing, thus is our current approach to disk space and sizing issues we
> had in the past.
>

For our new generation of tables we have switched to use TWCS, that's the
reason we don't write anymore to those old tables which are still using
STCS.

Cheers,
--
Alex


RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
Alex,

a single (largish) SSTable or any other SSTable for a table, which does not get 
any writes (with e.g. deletes) anymore, will most likely not be part of an 
automatic minor compaction anymore, thus may stay forever on disk, if I don’t 
miss anything crucial here. Might be different though, if you are entirely 
writing TTL-based, cause single SSTable based automatic tombstone compaction 
may kick in here, but I’m not really experienced with that.

We had been suffering a lot with storing timeseries data with STCS and disk 
capacity to have the cluster working smoothly and automatic minor compactions 
kicking out aged timeseries data according to our retention policies in the 
business logic. TWCS is unfortunately not an option for us. So, we did run 
major compactions every X weeks to reclaim disk space, thus from an operational 
perspective, by far not nice. Thus, finally decided to change STCS 
min_threshold from default 4 to 2, to let minor compactions kick in more 
frequently. We can live with the additional IO/CPU this is causing, thus is our 
current approach to disk space and sizing issues we had in the past.

Thomas

From: Oleksandr Shulgin 
Sent: Dienstag, 11. September 2018 09:47
To: User 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas 
mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
As far as I remember, in newer Cassandra versions, with STCS, nodetool compact 
offers a ‘-s’ command-line option to split the output into files with 50%, 25% 
… in size, thus in this case, not a single largish SSTable anymore. By default, 
without -s, it is a single SSTable though.

Thanks Thomas, I've also spotted the option while testing this approach.  I 
understand that doing major compactions is generally not recommended, but do 
you see any real drawback of having a single SSTable file in case we stopped 
writing new data to the table?

--
Alex

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Oleksandr Shulgin
On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>> compact offers a ‘-s’ command-line option to split the output into files
>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>> anymore. By default, without -s, it is a single SSTable though.
>>
>
> Thanks Thomas, I've also spotted the option while testing this approach.
>

Yet another surprising aspect of using `nodetool compact` is that it
triggers major compaction on *all* nodes in the cluster at the same time.
I don't see where this is documented and this was contrary to my
expectation.  Does this behavior make sense to anyone?  Is this a bug?  The
version is 3.0.

--
Alex


Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Oleksandr Shulgin
On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>> compact offers a ‘-s’ command-line option to split the output into files
>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>> anymore. By default, without -s, it is a single SSTable though.
>>
>
> Thanks Thomas, I've also spotted the option while testing this approach.
> I understand that doing major compactions is generally not recommended, but
> do you see any real drawback of having a single SSTable file in case we
> stopped writing new data to the table?
>

A related question is: given that we are not writing new data to these
tables, it would make sense to exclude them from the routine repair
regardless of the option we use in the end to remove the tombstones.

However, I've just checked the timestamps of the SSTable files on one of
the nodes and to my surprise I can find some files written only a few weeks
ago (most of the files are half a year ago, which is expected because it
was the time we were adding this DC).  But we've stopped writing to the
tables about a year ago and we repair the cluster very week.

What could explain that we suddenly see these new SSTable files?  They
shouldn't be there even due to overstreaming, because one would need to
find some differences in the Merkle tree in the first place, but I don't
see how that could actually happen in our case.

Any ideas?

Thanks,
--
Alex


Re: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Oleksandr Shulgin
On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> As far as I remember, in newer Cassandra versions, with STCS, nodetool
> compact offers a ‘-s’ command-line option to split the output into files
> with 50%, 25% … in size, thus in this case, not a single largish SSTable
> anymore. By default, without -s, it is a single SSTable though.
>

Thanks Thomas, I've also spotted the option while testing this approach.  I
understand that doing major compactions is generally not recommended, but
do you see any real drawback of having a single SSTable file in case we
stopped writing new data to the table?

--
Alex


RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas
As far as I remember, in newer Cassandra versions, with STCS, nodetool compact 
offers a ‘-s’ command-line option to split the output into files with 50%, 25% 
… in size, thus in this case, not a single largish SSTable anymore. By default, 
without -s, it is a single SSTable though.

Thomas

From: Jeff Jirsa 
Sent: Montag, 10. September 2018 19:40
To: cassandra 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

I think it's important to describe exactly what's going on for people who just 
read the list but who don't have context. This blog does a really good job: 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
 , but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been 
written ON TOP of another piece of live data, so we need to get that deletion 
marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything on 
just one sstable MAY NOT remove the tombstone - we can't get rid of the 
tombstone if sstable A overlaps another sstable with the same partition (which 
we identify via bloom filter) that has any data with a lower timestamp (we 
don't check the sstable for a shadowed value, we just look at the minimum live 
timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys) and 
combine them together, which makes tombstones past GCGS purgable and should 
remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can approximate 
it using user defined compaction ( 
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html
 ) - it's a JMX endpoint that let's you tell cassandra to compact one or more 
sstables together based on parameters you choose. This is somewhat like 
upgradesstables or scrub, but you can combine sstables as well. If you choose 
candidates intelligently (notably, oldest sstables first, or sstables you know 
overlap), you can likely manually clean things up pretty quickly. At one point, 
I had a jar that would do single sstable at a time, oldest sstable first, and 
it pretty much worked for this purpose most of the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll give 
you one huge sstable, which will be unfortunate long term (probably less of a 
problem if you're no longer writing to this table).


On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) 
mailto:chars...@cisco.com.invalid>> wrote:
Scrub takes a very long time and does not remove the tombstones. You should do 
garbage cleaning. It immediately removes the tombstones.

Thaks,
Charu

From: Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, September 10, 2018 at 6:53 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Drop TTLd rows: upgradesstables -a or scrub?

Hello,

We have some tables with significant amount of TTLd rows that have expired by 
now (and more gc_grace_seconds have passed since the TTL).  We have stopped 
writing more data to these tables quite a while ago, so background compaction 
isn't running.  The compaction strategy is the default SizeTiered one.

Now we would like to get rid of all the droppable tombstones in these tables.  
What would be the approach that puts the least stress on the cluster?

We've considered a few, but the most promising ones seem to be these two: 
`nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra 
version 3.0.

Now, this docs page recommends to use upgradesstables wherever possible: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html
What is the reason behind it?

From source code I can see that Scrubber the class which is going to drop the 
tombstones (and report the total number in the logs): 

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas

From: Jeff Jirsa 
Sent: Montag, 10. September 2018 19:40
To: cassandra 
Subject: Re: Drop TTLd rows: upgradesstables -a or scrub?

I think it's important to describe exactly what's going on for people who just 
read the list but who don't have context. This blog does a really good job: 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
 , but briefly:

- When a TTL expires, we treat it as a tombstone, because it may have been 
written ON TOP of another piece of live data, so we need to get that deletion 
marker to all hosts, just like a manual explicit delete
- Tombstones in sstable A may shadow data in sstable B, so doing anything on 
just one sstable MAY NOT remove the tombstone - we can't get rid of the 
tombstone if sstable A overlaps another sstable with the same partition (which 
we identify via bloom filter) that has any data with a lower timestamp (we 
don't check the sstable for a shadowed value, we just look at the minimum live 
timestamp of the table)

"nodetool garbagecollect" looks for sstables that overlap (partition keys) and 
combine them together, which makes tombstones past GCGS purgable and should 
remove them (and data shadowed by them).

If you're on a version without nodetool garbagecollection, you can approximate 
it using user defined compaction ( 
http://thelastpickle.com/blog/2016/10/18/user-defined-compaction.html
 ) - it's a JMX endpoint that let's you tell cassandra to compact one or more 
sstables together based on parameters you choose. This is somewhat like 
upgradesstables or scrub, but you can combine sstables as well. If you choose 
candidates intelligently (notably, oldest sstables first, or sstables you know 
overlap), you can likely manually clean things up pretty quickly. At one point, 
I had a jar that would do single sstable at a time, oldest sstable first, and 
it pretty much worked for this purpose most of the time.

If you have room, a "nodetool compact" on stcs will also work, but it'll give 
you one huge sstable, which will be unfortunate long term (probably less of a 
problem if you're no longer writing to this table).





On Mon, Sep 10, 2018 at 10:29 AM Charulata Sharma (charshar) 
mailto:chars...@cisco.com.invalid>> wrote:
Scrub takes a very long time and does not remove the tombstones. You should do 
garbage cleaning. It immediately removes the tombstones.

Thaks,
Charu

From: Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, September 10, 2018 at 6:53 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Drop TTLd rows: upgradesstables -a or scrub?

Hello,

We have some tables with significant amount of TTLd rows that have expired by 
now (and more gc_grace_seconds have passed since the TTL).  We have stopped 
writing more data to these tables quite a while ago, so background compaction 
isn't running.  The compaction strategy is the default SizeTiered one.

Now we would like to get rid of all the droppable tombstones in these tables.  
What would be the approach that puts the least stress on the cluster?

We've considered a few, but the most promising ones seem to be these two: 
`nodetool scrub` or `nodetool upgradesstables -a`.  We are using Cassandra 
version 3.0.

Now, this docs page recommends to use upgradesstables wherever possible: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsScrub.html
What is the reason behind it?

From source code I can see that Scrubber the class which is going to drop the 
tombstones (and report the total number in the logs): 

Re: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Oleksandr Shulgin
On Mon, Sep 10, 2018 at 10:03 PM Jeff Jirsa  wrote:

> How much free space do you have, and how big is the table?
>

So there are 2 tables, one is around 120GB and the other is around 250GB on
every node.  On the node with the most free disk space we still have around
500GB available and on the node with the least free space: 300GB.

So if I understand it correctly, we could still do major compaction while
keeping STCS and we should not hit 100% disk space, if we first compact one
of the tables, and then the other (we expect quite some free space to
become available due to to all those TTL tombstones being removed in the
process).

Is there any real drawback of having a single big SSTable in our case where
we never going to append more data to the table?

Switching to LCS is another option.
>

Hm, this is interesting idea.  The expectation should be that even if we
don't remove 100% of the tombstones, we should be able to get rid of 90%
them on the highest level, right?  And if we would have less space
available, using LCS could make progress by re-organizing the partitions in
smaller increments, so we could still do it if we had less free space than
the smallest table?

Cheers,
--
Alex


Scrub a single SSTable only?

2018-09-11 Thread Steinmaurer, Thomas
Hello,

is there a way to Online scrub a particular SSTable file only and not the 
entire column family?

According to the Cassandra logs we have a corrupted SSTable smallish compared 
to the entire data volume of the column family in question.

To my understanding, both, nodetool scrub and sstablescrub operate on the 
entire column family and can't work on a single SSTable, right?

There is still the way to shutdown Cassandra and remove the file from disk, but 
ideally I want to have that as an online operation.

Perhaps there is something JMX based?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313