Re: High disk io read load

2017-02-15 Thread Benjamin Roth
Erm sorry, forgot to mention. In this case "cas10" is Node A with 512
tokens and "cas9" Node B with 256 tokens.

2017-02-16 6:38 GMT+01:00 Benjamin Roth :

> It doesn't really look like that:
> https://cl.ly/2c3Z1u2k0u2I
>
> Thats the ReadLatency.count metric aggregated by host which represents the
> actual read operations, correct?
>
> 2017-02-15 23:01 GMT+01:00 Edward Capriolo :
>
>> I think it has more than double the load. It is double the data. More
>> read repair chances. More load can swing it's way during node failures etc.
>>
>> On Wednesday, February 15, 2017, Benjamin Roth 
>> wrote:
>>
>>> Hi there,
>>>
>>> Following situation in cluster with 10 nodes:
>>> Node A's disk read IO is ~20 times higher than the read load of node B.
>>> The nodes are exactly the same except:
>>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>>
>>> Node A has roughly 460GB, Node B 260GB total disk usage.
>>> Both nodes have 128GB RAM and 40 cores.
>>>
>>> Of course I assumed that Node A does more reads because cache / load
>>> ratio is worse but a factor of 20 makes me very sceptic.
>>>
>>> Of course Node A has a much higher and less predictable latency due to
>>> the wait states.
>>>
>>> Has anybody experienced similar situations?
>>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>>> payload is not that few. I am pretty sure that not the whole dataset of
>>> 460GB is "hot".
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-15 Thread Benjamin Roth
It doesn't really look like that:
https://cl.ly/2c3Z1u2k0u2I

Thats the ReadLatency.count metric aggregated by host which represents the
actual read operations, correct?

2017-02-15 23:01 GMT+01:00 Edward Capriolo :

> I think it has more than double the load. It is double the data. More read
> repair chances. More load can swing it's way during node failures etc.
>
> On Wednesday, February 15, 2017, Benjamin Roth 
> wrote:
>
>> Hi there,
>>
>> Following situation in cluster with 10 nodes:
>> Node A's disk read IO is ~20 times higher than the read load of node B.
>> The nodes are exactly the same except:
>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>
>> Node A has roughly 460GB, Node B 260GB total disk usage.
>> Both nodes have 128GB RAM and 40 cores.
>>
>> Of course I assumed that Node A does more reads because cache / load
>> ratio is worse but a factor of 20 makes me very sceptic.
>>
>> Of course Node A has a much higher and less predictable latency due to
>> the wait states.
>>
>> Has anybody experienced similar situations?
>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>> payload is not that few. I am pretty sure that not the whole dataset of
>> 460GB is "hot".
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Problems with large partitions and compaction

2017-02-15 Thread Dan Kinder
What Cassandra version? CMS or G1? What are your timeouts set to?

"GC activity"  - Even if there isn't a lot of activity per se maybe there
is a single long pause happening. I have seen large partitions cause lots
of allocation fast.

Looking at SSTable Levels in nodetool cfstats can help, look at it for all
your tables.

Don't recommend switching to STCS until you know more. You end up with
massive compaction that takes a long time to settle down.

On Tue, Feb 14, 2017 at 5:50 PM, John Sanda  wrote:

> I have a table that uses LCS and has wound up with partitions upwards of
> 700 MB. I am seeing lots of the large partition warnings. Client requests
> are subsequently failing. The driver is not reporting timeout exception,
> just NoHostAvailableExceptions (in the logs I have reviewed so far). I know
> that I need to redesign the table to avoid such large partitions. What
> specifically goes wrong that results in the instability I am seeing? Or put
> another way, what issues will compacting really large partitions cause?
> Initially I thought that there was high GC activity, but after closer
> inspection that does not really seem to happening. And most of the failures
> I am seeing are on reads, but for an entirely different table. Lastly, does
> anyone has anyone had success to switching to STCS in this situation as a
> work around?
>
> Thanks
>
> - John
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Re: High disk io read load

2017-02-15 Thread Edward Capriolo
I think it has more than double the load. It is double the data. More read
repair chances. More load can swing it's way during node failures etc.

On Wednesday, February 15, 2017, Benjamin Roth 
wrote:

> Hi there,
>
> Following situation in cluster with 10 nodes:
> Node A's disk read IO is ~20 times higher than the read load of node B.
> The nodes are exactly the same except:
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>
> Node A has roughly 460GB, Node B 260GB total disk usage.
> Both nodes have 128GB RAM and 40 cores.
>
> Of course I assumed that Node A does more reads because cache / load ratio
> is worse but a factor of 20 makes me very sceptic.
>
> Of course Node A has a much higher and less predictable latency due to the
> wait states.
>
> Has anybody experienced similar situations?
> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
> payload is not that few. I am pretty sure that not the whole dataset of
> 460GB is "hot".
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


High disk io read load

2017-02-15 Thread Benjamin Roth
Hi there,

Following situation in cluster with 10 nodes:
Node A's disk read IO is ~20 times higher than the read load of node B.
The nodes are exactly the same except:
- Node A has 512 tokens and Node B 256. So it has double the load (data).
- Node A also has 2 SSDs, Node B only 1 SSD (according to load)

Node A has roughly 460GB, Node B 260GB total disk usage.
Both nodes have 128GB RAM and 40 cores.

Of course I assumed that Node A does more reads because cache / load ratio
is worse but a factor of 20 makes me very sceptic.

Of course Node A has a much higher and less predictable latency due to the
wait states.

Has anybody experienced similar situations?
Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
payload is not that few. I am pretty sure that not the whole dataset of
460GB is "hot".

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


RE: Current data density limits with Open Source Cassandra

2017-02-15 Thread SEAN_R_DURITY
I request 1-2 TB of disk per node, depending on how large the data is estimated 
to be (for larger data, 2 TB). I have some dense nodes (4+ TB of disk 
available). They are harder to manage for repairs, bootstrapping, compaction, 
etc. because it takes so long to stream the data, etc. For the actual 
application, I have not seen a great impact based on the size of disk available.


Sean Durity

From: daemeon reiydelle [mailto:daeme...@gmail.com]
Sent: Wednesday, February 08, 2017 10:56 PM
To: user@cassandra.apache.org
Subject: Re: Current data density limits with Open Source Cassandra

your MMV. Think of that storage limit as fairly reasonable for active data 
likely to tombstone. Add more for older/historic data. Then think about time to 
recover a node.


...

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Wed, Feb 8, 2017 at 2:14 PM, Ben Slater 
> wrote:
The major issue we’ve seen with very high density (we generally say <2TB node 
is best) is manageability - if you need to replace a node or add node then 
restreaming data takes a *long* time and there we fairly high chance of a 
glitch in the universe meaning you have to start again before it’s done.

Also, if you’re uses STCS you can end up with gigantic compactions which also 
take a long time and can cause issues.

Heap limitations are mainly related to partition size rather than node density 
in my experience.

Cheers
Ben

On Thu, 9 Feb 2017 at 08:20 Hannu Kröger 
> wrote:
Hello,

Back in the day it was recommended that max disk density per node for Cassandra 
1.2 was at around 3-5TB of uncompressed data.

IIRC it was mostly because of heap memory limitations? Now that off-heap 
support is there for certain data and 3.x has different data storage format, is 
that 3-5TB still a valid limit?

Does anyone have experience on running Cassandra with 3-5TB compressed data ?

Cheers,
Hannu
--

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Discrete events table - Partition Question

2017-02-15 Thread Ahmed Eljami
​Hello,

​I don't see any impact in your case (a table without  composite key).

But it can be less flexible on your query-pattern, in this case you can't
return an event by date... for example! but if you'r sure that you will
query only by id_event, in this case, no problems.


Discrete events table - Partition Question

2017-02-15 Thread Imran
Hello everyone -

I have a modeling challenge where we are recording events about 1000 a sec in a 
Cassandra table. The event id is unique and is being used as a partition key 
with no clustering columns. I understand this is a anti pattern and will result 
in discrete partitions. 
The question I have is the impact of this design on the node and in particular
- heap
- system memory
- disk usage
- in memory structures (memtable/index summary/bloom filters)
- on disk (index files)
-compaction

Any feedback is greatly appreciated.

Thanks
Imran


Re: Determining if data will be created on Cassandra Write Exceptions

2017-02-15 Thread Nicolas Guyomar
Hi Rouble,

I usually have to read javadoc in java driver to get my ideas straight
regarding exception handling.

You can find informations reading :
http://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/core/policies/RetryPolicy.html
 and for instance
http://docs.datastax.com/en/drivers/java/3.1/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html
 with the onWriteTimeout method which differentiate several case of error.

As Edward stated, you can know how many replica acknowleged the write in
Cassandra response.

Keep in mind that retrying usually mean your write query is idempotent or
you don't care having duplicate entries


On 14 February 2017 at 21:49, Edward Capriolo  wrote:

>
>
> On Tue, Feb 14, 2017 at 2:30 PM, rouble  wrote:
>
>> Cassandra Gurus,
>>
>> I have a question open on stackoverlow on how to determine if data is
>> actually created when a Write exception is thrown: http://stackoverflow.c
>> om/questions/42231140/determining-if-data-will-be-created-on
>> -cassandra-write-exceptions
>>
>> From my discussion on the question, it seems that on *any* Cassandra
>> write, *any* exception, means the data may or may not be written. With the
>> exception of InvalidQueryException.
>>
>> I find this unsettling. Maybe I need time to adjust from my RDBMS
>> background, but how is Cassandra supposed to be used for systems that need
>> user feedback? or is it?
>>
>> Let me use the simple example of account creation. User tries to create
>> an account, and we need to indicate one way or the other whether the
>> account was created. Lets say a WriteTimeoutException is thrown while
>> trying to add the user. User may or may not be written, what do we tell the
>> user? Should we just rollback the change and tell the user that it failed.
>> This seems like the only thing we can do deterministically (and I notice
>> someone doing just that here: http://stackoverflow.com/a/34860495/215120
>> ).
>>
>> How are people handling WriteTimeoutExceptions or UnavailableExceptions?
>> Rolling back in every case does not seem practical.
>>
>> tia,
>> Rouble
>>
>
> There is a difference between WriteTimeoutException and
> UnavailableException.
>
> UnavailableException indicates the write was never even attempted.
>
> WriteTimeoutException means the write was attempted. I believe you can
> interrogate the exception to determine if the operation was successful on
> any of the natural endpoints.
>
> The way to "cope" is idempotent writes and retries.  If that model does
> not fit it is a square peg round hole discussion.
>