Maximum SSTable size

2018-06-27 Thread Lucas Benevides
Hello Community,

Is there a maximum SSTable Size?
If there is not, does it go up to the maximum Operational System values?

Thanks in advance,
Lucas Benevides


Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Hello kooljava2,

There aren't many books about Cassandra, but one of the most famous is the
"Cassandra: The definitive guide: Distributed Data at Web Scale", by Hewitt
The problem is that as Cassandra evolves very fast, these books get out of
date quickly.
To understand some concepts that exist in many different NoSQL databases
(many of then originary from the Distributed Systems area), there is the
book "NoSQL Distilled", by Martin Fowler.

Unfortunately the documentation is also not the strongest thing about
Cassandra, reason why this group is very important. But everyone can
cooperate on this.

Lucas B. Dias



2018-05-14 13:56 GMT-03:00 kooljava2 <koolja...@yahoo.com.invalid>:

> Hello,
>
> Thank you Lucas for sharing.  I am still a beginner in Cassandra NoSQL
> world. Are there any other good books related to Performance tuning and
> Architecture overview?
>
> Thank you.
>
> On Monday, 14 May 2018, 07:57:38 GMT-7, Nitan Kainth <
> nitankai...@gmail.com> wrote:
>
>
> Hi Lucas,
>
> I am not able to download. can you share as attachment in email?
>
>
>
> Regards,
> Nitan K.
> Cassandra and Oracle Architect/SME
> Datastax Certified Cassandra expert
> Oracle 10g Certified
>
> On Mon, May 14, 2018 at 9:12 AM, Lucas Benevides <
> lu...@maurobenevides.com.br> wrote:
>
> Dear community,
>
> I want to tell you about my paper published in a conference in March. The
> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
> Case Study"  and it is available (not for free) in http://www.scitepress.org/
> DigitalLibrary/Link.aspx?doi= 10.5220/0006782702770284
> <http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006782702770284>
>  .
>
> TWCS is used and compared with DTCS.
>
> I hope you can download it, unfortunately I cannot send copies as the
> publisher has its copyright.
>
> Lucas B. Dias
>
>
>
>


Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Thank you Jeff Jirsa by your comments,

How can we do this:  "fix this by not scheduling the major compaction until
we know all of the sstables in the window are available to be compacted"?

About the column-family schema, I had to customize the cassandra-stress
tool so that it could create a reasonable number of rows per partition. In
the default behavior it keeps creating repeated clustering keys for each
partition, and so most data get updated instead of inserted.

Lucas B. Dias

2018-05-14 14:03 GMT-03:00 Jeff Jirsa <jji...@gmail.com>:

> Interesting!
>
> I suspect I know what the increased disk usage in TWCS, and it's a
> solvable problem, the problem is roughly something like this:
> - Window 1 has sstables 1, 2, 3, 4, 5, 6
> - We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window)
> - The TWCS window rolls over
> - We flush (sstable 7), and trigger the TWCS window major compaction,
> which starts compacting 5, 6, 7 + any other sstable from that window
> - If the first compaction (1,2,3,4) has finished by the time sstable 7 is
> flushed, we'll include it's result in that compaction, if it doesn't we'll
> have to do the major compaction twice to guarantee we have exactly one
> sstable per window, which will temporarily increase disk space
>
> We can likely fix this by not scheduling the major compaction until we
> know all of the sstables in the window are available to be compacted.
>
> Also your data model is probably typical, but not well suited for time
> series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on
> youtube), I mention aligning partition keys to TWCS windows, which involves
> adding a second component to the partition key. This is hugely important in
> terms of making sure TWCS data expires quickly and avoiding having to read
> from more than one TWCS window at a time.
>
>
> - Jeff
>
>
>
> On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides <
> lu...@maurobenevides.com.br> wrote:
>
>> Dear community,
>>
>> I want to tell you about my paper published in a conference in March. The
>> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
>> Case Study"  and it is available (not for free) in
>> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10
>> .5220/0006782702770284 .
>>
>> TWCS is used and compared with DTCS.
>>
>> I hope you can download it, unfortunately I cannot send copies as the
>> publisher has its copyright.
>>
>> Lucas B. Dias
>>
>>
>>
>


Academic paper about Cassandra database compaction

2018-05-14 Thread Lucas Benevides
Dear community,

I want to tell you about my paper published in a conference in March. The
title is " NoSQL Database Performance Tuning for IoT Data - Cassandra Case
Study"  and it is available (not for free) in
http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006782702770284
 .

TWCS is used and compared with DTCS.

I hope you can download it, unfortunately I cannot send copies as the
publisher has its copyright.

Lucas B. Dias


Re: Does LOCAL_ONE still replicate data?

2018-05-08 Thread Lucas Benevides
Yes, but remind that there is Write Consistency and Read Consistency.
To prevent the reads from reaching the other DC, you should set the Read
Consistency LOCAL_ONE.
As Hannu Kroger said, the LOCAL_ONE may be enough to you but maybe not if
you want to be sure that your data was written also in another DC.

Lucas B. Dias


2018-05-08 7:26 GMT-03:00 Hannu Kröger :

> Writes are always replicated to all nodes (if they are online).
>
> LOCAL_ONE in writes just means that client will get an “OK” for the write
> only after at least node in local datacenter has acknowledged that the
> write is done.
>
> If all local replicas are offline, then the write will fail even if it
> gets written in your other DC.
>
> Hannu
>
>
> On 8 May 2018, at 13:24, Jakub Lida  wrote:
>
> Hi,
>
> I want to add a new DC to an existing cluster (RF=1 per DC).
> Will setting consistency to LOCAL_ONE on all machines make it still
> replicate write requests sent to online DCs to all DCs (including the new
> one being rebuilt) and only isolate read requests from reaching the new DC?
> That is basically want I want to accomplish.
>
> Thanks in advance, Jakub
>
>
>


Upgrade to 3.11.2 disabled JMX

2018-04-05 Thread Lucas Benevides
Dear community members,

I have just upgraded my Cassandra from version 3.11.1 to 3.11.2. I kept my
previous configuration files: cassandra.yaml and cassandra-env.sh. However,
when I started the cassandra service, I couldn't connect via JMX (tried to
to it with a java program, with JConsole and a prometheus client).

When I run netstat -na it does not show port 7199 open.
Tried to look at the logs but didn't see anything.

Can you figure out why it happened and point any possible solution? Config
files enable JMX with authtenticaion=false, but it doesn't work.

Thanks in advance,
Lucas Benevides


TWCS enabling tombstone compaction

2018-03-09 Thread Lucas Benevides
Dear community,

I have been using TWCS in my lab, with TTL'd data.
In the debug log there is always the sentence:
"TimeWindowCompactionStrategy.java:65 Disabling tombstone compactions for
TWCS". Indeed, the line is always repeated.

What does it actually mean? If my data gets expired, the TWCS is already
working and purging the SSTables that become expired. It surely sound
strange to me to disable tombstone compaction.

In the subcompaction subproperties there are only two subproperties,
compaction_window_unit and compaction_window_size. Jeff already told us
that the STCS properties also apply to TWCS, although it is not in the
documentation.

Thanks in advance,
Lucas Benevides Dias


Re: Tracing cql code being run through the drive

2018-02-22 Thread Lucas Benevides
I don't know if it you help you, but when the debug log is turned on, it
displays the slow queries.
To consider slow, the parameter  read_request_timeout_in_ms is considered.
Maybe if you decrease it, you can monitor your queries, with $tail -F
debug.log

Just an idea, I've never made it. Surely it must be made in a development
environment.

Lucas B. Dias

2018-02-22 8:27 GMT-03:00 Jonathan Baynes :

> Hi Community,
>
>
>
> Can anyone help me understand what class’s id need to set logging on , if
> I want to capture the cql commands being run through the driver, similar to
> how profiler (MSSQL) would work? I need to see what’s being run, and if the
> query is actually getting to cassandra?
>
>
>
> Has anyone had any experience in doing this?
>
>
>
> Thanks in advance.
>
>
>
> J
>
>
>
> *Jonathan Baynes*
>
> DBA
> Tradeweb Europe Limited
>
> Moor Place  •  1 Fore Street Avenue
> 
>   •  London EC2Y 9DT
> 
> P +44 (0)20 77760988 <+44%2020%207776%200988>  •  F +44 (0)20 7776 3201
> <+44%2020%207776%203201>  •  M +44 (0)7884111546 <+44%207884%20111546>
>
> jonathan.bay...@tradeweb.com
>
>
>
> [image: cid:image001.jpg@01CD26AD.4165F110] 
> follow us:  *[image: cid:image002.jpg@01CD26AD.4165F110]*
>    [image:
> cid:image003.jpg@01CD26AD.4165F110] 
>
> —
>
> A leading marketplace  for
> electronic fixed income, derivatives and ETF trading
>
>
>
> 
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy it. Any unauthorized
> copying, disclosure or distribution of the material in this e-mail is
> strictly forbidden. Tradeweb reserves the right to monitor all e-mail
> communications through its networks. If you do not wish to receive
> marketing emails about our products / services, please let us know by
> contacting us, either by email at contac...@tradeweb.com or by writing to
> us at the registered office of Tradeweb in the UK, which is: Tradeweb
> Europe Limited (company number 3912826), 1 Fore Street Avenue London EC2Y
> 9DT
> .
> To see our privacy policy, visit our website @ www.tradeweb.com.
>


Re: Cassandra cluster: could not reach linear scalability

2018-02-19 Thread Lucas Benevides
Why did you set the number of 1000 threads?
Does it show to be the more performatic when threads = auto?

I have used stress tool in a larger test bed (10 nodes) and my optimal
setup was 24 threads.
To check this you must monitor the stress node, both the CPU and I/O. And
give it a try with fewer threads.

Lucas Benevides
Ipea

2018-02-18 8:29 GMT-03:00 onmstester onmstester <onmstes...@zoho.com>:

> I've configured a simple cluster using two PC with identical spec:
>
>   cpu core i5
>RAM: 8GB ddr3
>Disk: 1TB 5400rpm
>Network: 1 G (I've test it with iperf, it really is!)
>
> using the common configs described in many sites including datastax itself:
>
> cluster_name: 'MyCassandraCluster'
> num_tokens: 256
> seed_provider:
>   - class_name: org.apache.cassandra.locator.SimpleSeedProvider
> parameters:
>  - seeds: "192.168.1.1,192.168.1.2"
> listen_address:
> rpc_address: 0.0.0.0
> endpoint_snitch: GossipingPropertyFileSnitch
>
> Running stress tool:
>
> cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
> 192.168.1.1,192.168.1.2
>
> Over each node it shows 39 K writes/seconds, but running the same stress
> tool command on cluster of both nodes shows 45 K writes/seconds. I've done
> all the tuning mentioned by apache and datastax. There are many use cases
> on the net proving Cassandra linear Scalability So what is wrong with my
> cluster?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>


Re: [announce] Release of Cassandra Prometheus metrics exporter

2018-02-06 Thread Lucas Benevides
Hello Romain,

I want to test criteo but have some doubts. Graphite is not good for me
because data is stored in a whisper file, which is not accurate and I have
scientific purposes.

Do I have to run the java application (jar) in every node of my cluster?
Is the internal storage a round-robin database that aggregate value
collected, or it works as a regular persistent database?

Thanks a lot.
Lucas Benevides

2018-01-10 13:06 GMT-02:00 Romain Gerard <romain.ger...@erebe.eu>:

> Hello C*,
>
> A little mail to announce that we released today our internal tool at
> Criteo to monitor Cassandra nodes with Prometheus[1].
> https://github.com/criteo/cassandra_exporter
>
> The application is production ready as we use it internally to monitor
> our > 100 Cassandra nodes.
>
> I hope it can be useful to you too !
> Feel free to send feedbacks/contributions/questions.
>
> [1] https://prometheus.io/
>
> Regards,
> Romain Gérard
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Heavy one-off writes best practices

2018-01-30 Thread Lucas Benevides
Hello Julien,

After reading the excelent post and video by Alain Rodriguez, maybe you
should read the paper Performance Tuning of Big Data Platform: Cassandra
Case Study
<http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A948824=-5280>
by SATHVIK KATAM. In the results he sets new values to memTable Cleanup
Threshold  and Key cache size.
Although it is not proven that the same results will persist in different
environments, it is a good starting point.

Lucas Benevides

2018-01-30 6:12 GMT-02:00 Julien Moumne <jmou...@deezer.com>:

> Hello, I am looking for best practices for the following use case :
>
> Once a day, we insert at the same time 10 full tables (several 100GiB
> each) using Spark C* driver, without batching, with CL set to ALL.
>
> Whether skinny rows or wide rows, data for a partition key is always
> completely updated / overwritten, ie. every command is an insert.
>
> This imposes a great load on the cluster (huge CPU consumption), this load
> greatly impacts the constant reads we have. Read latency are fine the rest
> of the time.
>
> Is there any best practices we should follow to ease the load when
> importing data into C* except
>  - reducing the number of concurrent writes and throughput on the driver
> side
>  - reducing the number of compaction threads and throughput on the cluster
>
> In particular :
>  - is there any evidence that writing multiple tables at the same time
> produces more load than writing the tables one at a time when tables are
> completely written at once such as we do?
>  - because of the heavy writes, we use STC. Is it the best choice
> considering data is completely overwritten once a day? Tables contain
> collections and UDTs.
>
> (We manage data expiration with TTL set to several days.
> We use SSDs.)
>
> Thanks!
>


Re: Compaction: ThreadPool Metrics vs Compaction Metrics

2018-01-08 Thread Lucas Benevides
Hello Ahmed,

I have questioned about the compaction metrics in 27/10/2017.
You may see the conversation here
<https://mail-archives.apache.org/mod_mbox/cassandra-user/201710.mbox/%3CCAOsmgAyr89MBk%2BRpbbmG6EkZN4r0fRmXEa9vdS2RxXmBLye%2B7A%40mail.gmail.com%3E>
:

Lucas Benevides

2018-01-05 14:13 GMT-02:00 Ahmed Eljami <ahmed.elj...@gmail.com>:

> ​Hello,
>
> ​​Could someone explain me the difference between the values of the two
> following metrics​:
>
> *​ThreadPool Metrics:​CompactionExecutor:CompletedTasks* vs *Compaction
> Metrics:CompletedTasks*
>
> I do not the same value when I query JMX!
>
> Thanks
>
>
>


Re: about write performance

2017-12-11 Thread Lucas Benevides
Good answer Oleksandr,

But I think the data is inserted in the Memtable already in the right
order. At least the datastax academy videos say so.
But it shouldn't make any difference anyhow.

Kind regards,
Lucas Benevides

2017-12-08 5:41 GMT-02:00 Oleksandr Shulgin <oleksandr.shul...@zalando.de>:

> On Fri, Dec 8, 2017 at 3:05 AM, Eunsu Kim <eunsu.bil...@gmail.com> wrote:
>
>> There is a table with a timestamp as a cluster key and sorted by ASC for
>> the column.
>>
>> Is it better to insert by the time order when inserting data into this
>> table for insertion performance? Or does it matter?
>>
>
> The writes hit memory tables first, so from this perspective it shouldn't
> matter.
>
> Later the memory tables are sorted according to the partition and
> clustering key and are flushed to disk in this order, forming the SSTable
> files.  The difference in performance you might experience upon reading the
> data, depending on compaction strategy you choose.  For time-series data
> with TTL there is good chance that TimeWindowCompactionStrategy is
> appropriate, given you mostly write with approx. monotonically increasing
> timestamps.  This helps organizing the data files for faster reads and
> really cheap removal of expired data: the whole file can be just dropped by
> compaction process once all records in it expire.
>
> Regards,
> --
> Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
> 127-59-707 <+49%20176%2012759707>
>
>


Re: Cassandra stress tool - data generation

2017-11-01 Thread Lucas Benevides
Hi Varun,

I apreciate you answer but this is not what is causing my problem.
Even if it is SEQ, as the excelent article by Ben Slater says, it will
always repeat the same sequential at each new operation (in my case one
operation equals to one partition).

But in that issue, I saw another one: https://issues.apache.
org/jira/browse/CASSANDRA-11138 that may be causing the problem. I will
apply this patch, test it and report it later.

Thank you
Lucas Benevides

2017-11-01 14:59 GMT-02:00 Varun Barala <varunbaral...@gmail.com>:

> https://www.instaclustr.com/deep-diving-cassandra-stress-
> part-3-using-yaml-profiles/ In this particular blog, they mentioned your
> case.
>
> Changed uniform() distribution to seq() distribution
> https://issues.apache.org/jira/browse/CASSANDRA-12490
>
> Thanks!!
>
>
> On Thu, Nov 2, 2017 at 12:54 AM, Varun Barala <varunbaral...@gmail.com>
> wrote:
>
>> Hi,
>>
>> https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
>>
>> In the blog, They covered many things in detail.
>>
>> Thanks!!
>>
>> On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
>> lu...@maurobenevides.com.br> wrote:
>>
>>> Dear community,
>>>
>>> I am using Cassandra Stress Tool and trying to simulate IoT generated
>>> data.
>>> So I created a column family with the device_id as the partition key.
>>>
>>> But in every different operation (the parameter received in the -n
>>> option) the generated values are the same. For instance, I have a column
>>> called observation_time which is supposed to be the time measured by the
>>> sensor. But in every partition the values are equal.
>>>
>>> Is there a way to make those values be randomically generated with
>>> different seeds? I need this way so that if the same device_id occurs
>>> again, it makes an INSERT instead of an UPSERT.
>>>
>>> To clarify: What is happening now (fictional data):
>>>
>>> operation 1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation 2
>>> device 2
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> What I want:
>>> operation1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation2
>>> device 2
>>> ts1: 02/01/1971  #Different values here.
>>> ts2: 05/01/1982
>>> ts3: 08/01/1993
>>>
>>> Thanks in advance,
>>> Lucas Benevides
>>>
>>>
>>
>


Cassandra stress tool - data generation

2017-11-01 Thread Lucas Benevides
Dear community,

I am using Cassandra Stress Tool and trying to simulate IoT generated data.
So I created a column family with the device_id as the partition key.

But in every different operation (the parameter received in the -n option)
the generated values are the same. For instance, I have a column called
observation_time which is supposed to be the time measured by the sensor.
But in every partition the values are equal.

Is there a way to make those values be randomically generated with
different seeds? I need this way so that if the same device_id occurs
again, it makes an INSERT instead of an UPSERT.

To clarify: What is happening now (fictional data):

operation 1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

operation 2
device 2
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

What I want:
operation1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

operation2
device 2
ts1: 02/01/1971  #Different values here.
ts2: 05/01/1982
ts3: 08/01/1993

Thanks in advance,
Lucas Benevides


Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-11-01 Thread Lucas Benevides
Thanks a lot Chris,

I had noticed that even the counter in the TotalCompactionsCompleted is
higher than the number of SSTables compactions, that is what interests me
most. I measured the number of compactions turning on the log_all in the
compaction settings in the tables and reading the compaction.log data (in
Json format).

These info you gave will be very useful to me. Hope it can get into the
documentation.

Lucas Benevides

2017-10-31 14:56 GMT-02:00 Chris Lohfink <clohfin...@gmail.com>:

> CompactionMetrics is a combination of the compaction executor (sstable
> compactions, secondary index build, view building, relocate,
> garbagecollect, cleanup, scrub etc) and validation executor (repairs). Keep
> in mind not all jobs execute 1 task per operation, things that use the
> parallelAllSSTableOperation like cleanup will create 1 task per sstable.
>
> The "CompletedTasks" metric is a measure of how many tasks ran on these
> two executors combined.
> The "TotalCompactionsCompleted" metric is a measure of how many
> compactions issued from the compaction manager ran (normal compactions,
> cache writes, scrub, 2i and MVs).  So while they may be close, depending on
> whats happening on the system, theres no assurance that they will be within
> any bounds of each other.
>
> So I would suspect validation compactions from repairs would be one major
> difference. If you run other operational tasks there will likely be more.
>
>
> On Mon, Oct 30, 2017 at 12:22 PM, Lucas Benevides <
> lu...@maurobenevides.com.br> wrote:
>
>> Kurt,
>>
>> I apreciate your answer but I don't believe CompletedTasks count the
>> "validation compactions". These are compactions that occur from repair
>> operations. I am running tests on 10 cluster nodes in the same physical
>> rack, with Cassandra Stress Tool and I didn't make any Repair commands. The
>> tables only last for seven hours, so it is not reasonable that tens of
>> thousands of these validation compactions occur per node.
>>
>> I tried to see the code and the CompletedTasks counter seems to be
>> populated by a method from the class java.util.concurrent.Thr
>> eadPoolExecutor.
>> So I really don't know what it is but surely is not the amount of
>> Compaction Completed Tasks.
>>
>> Thank you
>> Lucas Benevides
>>
>>-
>>
>>
>> 2017-10-30 8:05 GMT-02:00 kurt greaves <k...@instaclustr.com>:
>>
>>> I believe (may be wrong) that CompletedTasks counts Validation
>>> compactions while TotalCompactionsCompleted does not. Considering a lot of
>>> validation compactions can be created every repair it might explain the
>>> difference. I'm not sure why they are named that way or work the way they
>>> do. There appears to be no documentation around this in the code (what a
>>> surprise) and looks like it was last touched in CASSANDRA-4009
>>> <https://issues.apache.org/jira/browse/CASSANDRA-4009>, which also has
>>> no useful info.
>>>
>>> On 27 October 2017 at 13:48, Lucas Benevides <
>>> lu...@maurobenevides.com.br> wrote:
>>>
>>>> Dear community,
>>>>
>>>> I am studying the behaviour of the Cassandra
>>>> TimeWindowCompactionStragegy. To do so I am watching some metrics. Two of
>>>> these metrics are important: Compaction.CompletedTasks, a gauge, and the
>>>> TotalCompactionsCompleted, a Meter.
>>>>
>>>> According to the documentation (http://cassandra.apache.org/d
>>>> oc/latest/operating/metrics.html#table-metrics):
>>>> Completed Taks = Number of completed compactions since server [re]start.
>>>> TotalCompactionsCompleted = Throughput of completed compactions since
>>>> server [re]start.
>>>>
>>>> As I realized, the TotalCompactionsCompleted, in the Meter object, has
>>>> a counter, which I supposed would be numerically close to the
>>>> CompletedTasks gauge. But they are very different, with the Completed Tasks
>>>> being much higher than the TotalCompactions Completed.
>>>>
>>>> According to the code, in github (class metrics.CompactionMetrics.java
>>>> ):
>>>> Completed Taks - Number of completed compactions since server [re]start
>>>> TotalCompactionsCompleted - Total number of compactions since server
>>>> [re]start
>>>>
>>>> Can you help me and explain the difference between these two metrics,
>>>> as they seem to have very distinct values, with the Completed Tasks being
>>>> around 1000 times the value of the counter in
>>>> TotalCompactionsCompleted.
>>>>
>>>> Thanks in Advance,
>>>> Lucas Benevides
>>>>
>>>>
>>>
>>
>


Re: Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-10-30 Thread Lucas Benevides
Kurt,

I apreciate your answer but I don't believe CompletedTasks count the
"validation compactions". These are compactions that occur from repair
operations. I am running tests on 10 cluster nodes in the same physical
rack, with Cassandra Stress Tool and I didn't make any Repair commands. The
tables only last for seven hours, so it is not reasonable that tens of
thousands of these validation compactions occur per node.

I tried to see the code and the CompletedTasks counter seems to be
populated by a method from the class
java.util.concurrent.ThreadPoolExecutor.
So I really don't know what it is but surely is not the amount of
Compaction Completed Tasks.

Thank you
Lucas Benevides

   -


2017-10-30 8:05 GMT-02:00 kurt greaves <k...@instaclustr.com>:

> I believe (may be wrong) that CompletedTasks counts Validation compactions
> while TotalCompactionsCompleted does not. Considering a lot of validation
> compactions can be created every repair it might explain the difference.
> I'm not sure why they are named that way or work the way they do. There
> appears to be no documentation around this in the code (what a surprise)
> and looks like it was last touched in CASSANDRA-4009
> <https://issues.apache.org/jira/browse/CASSANDRA-4009>, which also has no
> useful info.
>
> On 27 October 2017 at 13:48, Lucas Benevides <lu...@maurobenevides.com.br>
> wrote:
>
>> Dear community,
>>
>> I am studying the behaviour of the Cassandra
>> TimeWindowCompactionStragegy. To do so I am watching some metrics. Two of
>> these metrics are important: Compaction.CompletedTasks, a gauge, and the
>> TotalCompactionsCompleted, a Meter.
>>
>> According to the documentation (http://cassandra.apache.org/d
>> oc/latest/operating/metrics.html#table-metrics):
>> Completed Taks = Number of completed compactions since server [re]start.
>> TotalCompactionsCompleted = Throughput of completed compactions since
>> server [re]start.
>>
>> As I realized, the TotalCompactionsCompleted, in the Meter object, has a
>> counter, which I supposed would be numerically close to the CompletedTasks
>> gauge. But they are very different, with the Completed Tasks being much
>> higher than the TotalCompactions Completed.
>>
>> According to the code, in github (class metrics.CompactionMetrics.java):
>> Completed Taks - Number of completed compactions since server [re]start
>> TotalCompactionsCompleted - Total number of compactions since server
>> [re]start
>>
>> Can you help me and explain the difference between these two metrics, as
>> they seem to have very distinct values, with the Completed Tasks being
>> around 1000 times the value of the counter in TotalCompactionsCompleted.
>>
>> Thanks in Advance,
>> Lucas Benevides
>>
>>
>


Cassandra Compaction Metrics - CompletedTasks vs TotalCompactionCompleted

2017-10-27 Thread Lucas Benevides
Dear community,

I am studying the behaviour of the Cassandra TimeWindowCompactionStragegy.
To do so I am watching some metrics. Two of these metrics are important:
Compaction.CompletedTasks, a gauge, and the TotalCompactionsCompleted, a
Meter.

According to the documentation (
http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics
):
Completed Taks = Number of completed compactions since server [re]start.
TotalCompactionsCompleted = Throughput of completed compactions since
server [re]start.

As I realized, the TotalCompactionsCompleted, in the Meter object, has a
counter, which I supposed would be numerically close to the CompletedTasks
gauge. But they are very different, with the Completed Tasks being much
higher than the TotalCompactions Completed.

According to the code, in github (class metrics.CompactionMetrics.java):
Completed Taks - Number of completed compactions since server [re]start
TotalCompactionsCompleted - Total number of compactions since server
[re]start

Can you help me and explain the difference between these two metrics, as
they seem to have very distinct values, with the Completed Tasks being
around 1000 times the value of the counter in TotalCompactionsCompleted.

Thanks in Advance,
Lucas Benevides


Re: [RELEASE] Apache Cassandra 3.11.1 released

2017-10-11 Thread Lucas Benevides
Hello Michael Schuler,

When will this version become available for upgrade from apt-get? I visited
the address http://www.apache.org/dist/cassandra/debian and there was no
version 3111.

To me it is easier to upgrade the nodes this way as I am in a lab, not in a
production site.

Thanks in advance,
Lucas Benevides


2017-10-10 18:14 GMT-03:00 Michael Shuler <mich...@pbandjelly.org>:

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.11.1.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.11 series. As always,
> please pay attention to the release notes[2] and Let us know[3] if you
> were to encounter any problem.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/QFBuPn
> [2]: (NEWS.txt) https://goo.gl/vHd41x
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Conection refuse

2017-08-29 Thread Lucas Benevides
Hello Amir,

You should see the log. If it was installed by the apt-get tool, it should
be in /var/log/cassandra/system.log.
It can occur when the schema of the node you are trying to connect is out
of date with the cluster.
How many nodes are there in you cluster?
What is the output of "nodetool describecluster"?

Best regards,
Lucas Benevides

2017-08-28 19:45 GMT-03:00 Amir Shahinpour <a...@holisticlabs.net>:

> Hi,
>
> I am getting an error connecting to cqlsh. I am getting the following
> error.
>
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
> Connection refused")})
>
> I change the Cassandra.yaml file setting for rpc_address to my ip address
> and listen_address to localhost.
>
>
> listen_address: localhost
> rpc_address: my_IP
>
> I also tried to change the cassandra-env.sh  to add my IP address but
> still same error.
>
> JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=my_IP"
>
> Any suggestion?
>
>
>
>


TWCS Parameters - min max trreshold

2017-08-14 Thread Lucas Benevides
Hello community,

I am testing the Time Window Compaction Strategy (TWCS), in Cassandra 3.11
version with 10 nodes. The documentation (
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfigureCompaction.html)
says it has only two parameters that are related to the same configuration:
the size of the Time Window.

However I tried to set the other two parameters: min_threshold and
max_threshold and it works. In the Alex Dejanovski post (
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html) he says that in
the first compaction, the TWCS uses the parameters of STCS.

Does TWCS uses only these two parameters of STCS (min and max threshold) or
 uses all the STSC parameters?

Lucas Benevides
Brasilia, Brazil.