from:"Benjamin Roth"

RE: Is it possible to have a column which can hold any data type (for inserting as json)

2017-02-01 Thread Benjamin Roth

This has to be done in your app. You can store your data as JSON in a text
column. You can use your favourite serializer. You can cast floats to
strings. You can even build a custom type. You can store it serialized as
blob. But there is no all purpose store all data in a magic way field.

Am 02.02.2017 05:30 schrieb "Rajeswari Menon" <rajeswar...@thinkpalm.com>:

> Yes. Is there any way to define value to accept any data type as the json
> value data may vary? Or is there any way to do the same without defining a
> schema?
>
>
>
> Regards,
>
> Rajeswari
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* 01 February 2017 15:36
> *To:* user@cassandra.apache.org
> *Subject:* RE: Is it possible to have a column which can hold any data
> type (for inserting as json)
>
>
>
> Value is defined as text column and you try to insert a double. That's
> simply not allowed
>
>
>
> Am 01.02.2017 09:02 schrieb "Rajeswari Menon" <rajeswar...@thinkpalm.com>:
>
> Given below is the sql query I executed.
>
>
>
> *insert* *into* data JSON'{
>
>   "id": 1,
>
>"address":"",
>
>"datatype":"DOUBLE",
>
>"name":"Longitude",
>
>"attributes":{
>
>   "ID":"1"
>
>},
>
>"category":"REAL",
>
>"value":1.390692,
>
>"timestamp":1485923271718,
>
>"quality":"GOOD"
>
> }';
>
>
>
> Regards,
>
> Rajeswari
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* 01 February 2017 12:35
> *To:* user@cassandra.apache.org
> *Subject:* Re: Is it possible to have a column which can hold any data
> type (for inserting as json)
>
>
>
> You should post the whole CQL query you try to execute! Why don't you use
> a native JSON type for your JSON data?
>
>
>
> 2017-02-01 7:51 GMT+01:00 Rajeswari Menon <rajeswar...@thinkpalm.com>:
>
> Hi,
>
>
>
> I have a json data as shown below.
>
>
>
> {
>
> "address":"127.0.0.1",
>
> "datatype":"DOUBLE",
>
> "name":"Longitude",
>
>  "attributes":{
>
> "Id":"1"
>
> },
>
> "category":"REAL",
>
> "value":1.390692,
>
> "timestamp":1485923271718,
>
> "quality":"GOOD"
>
> }
>
>
>
> To store the above json to Cassandra, I defined a table as shown below
>
>
>
> *create* *table* data
>
> (
>
>   id *int* *primary* *key*,
>
>   address text,
>
>   datatype text,
>
>   name text,
>
>   *attributes* *map* < text, text >,
>
>   category text,
>
>   value text,
>
>   "timestamp" *timestamp*,
>
>   quality text
>
> );
>
>
>
> When I try to insert the data as JSON I got the error : *Error decoding
> JSON value for value: Expected a UTF-8 string, but got a Double: 1.390692*.
> The message is clear that a double value cannot be inserted to text column.
> The real issue is that the value can be of any data type, so the schema
> cannot be predefined. Is there a way to create a column which can hold
> value of any data type. (I don’t want to hold the entire json as string. My
> preferred way is to define a schema.)
>
>
>
> Regards,
>
> Rajeswari
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>

Re: Is it possible to have a column which can hold any data type (for inserting as json)

2017-01-31 Thread Benjamin Roth

You should post the whole CQL query you try to execute! Why don't you use a
native JSON type for your JSON data?

2017-02-01 7:51 GMT+01:00 Rajeswari Menon <rajeswar...@thinkpalm.com>:

> Hi,
>
>
>
> I have a json data as shown below.
>
>
>
> {
>
> "address":"127.0.0.1",
>
> "datatype":"DOUBLE",
>
> "name":"Longitude",
>
>  "attributes":{
>
> "Id":"1"
>
> },
>
> "category":"REAL",
>
> "value":1.390692,
>
> "timestamp":1485923271718,
>
> "quality":"GOOD"
>
> }
>
>
>
> To store the above json to Cassandra, I defined a table as shown below
>
>
>
> *create* *table* data
>
> (
>
>   id *int* *primary* *key*,
>
>   address text,
>
>   datatype text,
>
>   name text,
>
>   *attributes* *map* < text, text >,
>
>   category text,
>
>   value text,
>
>   "timestamp" *timestamp*,
>
>   quality text
>
> );
>
>
>
> When I try to insert the data as JSON I got the error : *Error decoding
> JSON value for value: Expected a UTF-8 string, but got a Double: 1.390692*.
> The message is clear that a double value cannot be inserted to text column.
> The real issue is that the value can be of any data type, so the schema
> cannot be predefined. Is there a way to create a column which can hold
> value of any data type. (I don’t want to hold the entire json as string. My
> preferred way is to define a schema.)
>
>
>
> Regards,
>
> Rajeswari
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

RE: Is it possible to have a column which can hold any data type (for inserting as json)

2017-02-01 Thread Benjamin Roth

Value is defined as text column and you try to insert a double. That's
simply not allowed

Am 01.02.2017 09:02 schrieb "Rajeswari Menon" <rajeswar...@thinkpalm.com>:

> Given below is the sql query I executed.
>
>
>
> *insert* *into* data JSON'{
>
>   "id": 1,
>
>"address":"",
>
>"datatype":"DOUBLE",
>
>"name":"Longitude",
>
>"attributes":{
>
>   "ID":"1"
>
>},
>
>"category":"REAL",
>
>"value":1.390692,
>
>"timestamp":1485923271718,
>
>"quality":"GOOD"
>
> }';
>
>
>
> Regards,
>
> Rajeswari
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* 01 February 2017 12:35
> *To:* user@cassandra.apache.org
> *Subject:* Re: Is it possible to have a column which can hold any data
> type (for inserting as json)
>
>
>
> You should post the whole CQL query you try to execute! Why don't you use
> a native JSON type for your JSON data?
>
>
>
> 2017-02-01 7:51 GMT+01:00 Rajeswari Menon <rajeswar...@thinkpalm.com>:
>
> Hi,
>
>
>
> I have a json data as shown below.
>
>
>
> {
>
> "address":"127.0.0.1",
>
> "datatype":"DOUBLE",
>
> "name":"Longitude",
>
>  "attributes":{
>
> "Id":"1"
>
> },
>
> "category":"REAL",
>
> "value":1.390692,
>
> "timestamp":1485923271718,
>
> "quality":"GOOD"
>
> }
>
>
>
> To store the above json to Cassandra, I defined a table as shown below
>
>
>
> *create* *table* data
>
> (
>
>   id *int* *primary* *key*,
>
>   address text,
>
>   datatype text,
>
>   name text,
>
>   *attributes* *map* < text, text >,
>
>   category text,
>
>   value text,
>
>   "timestamp" *timestamp*,
>
>   quality text
>
> );
>
>
>
> When I try to insert the data as JSON I got the error : *Error decoding
> JSON value for value: Expected a UTF-8 string, but got a Double: 1.390692*.
> The message is clear that a double value cannot be inserted to text column.
> The real issue is that the value can be of any data type, so the schema
> cannot be predefined. Is there a way to create a column which can hold
> value of any data type. (I don’t want to hold the entire json as string. My
> preferred way is to define a schema.)
>
>
>
> Regards,
>
> Rajeswari
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

Re: CS process killed by kernel OOM

2017-02-06 Thread Benjamin Roth

Thanks for the reply. We got rid of the OOMs by increasing
vm.min_free_kbytes, it's default of approx 90mb is maybe a bit low for
systems with 128GB.
I guess the OOM happens because the kernel could not reclaim enough paged
memory instantly.
I can't tell if this is really a kernel bug or not. It also was my first
thought but in the end the main thing is, it works again and it does with
more mibn_free_kbytes

2017-02-06 11:53 GMT+01:00 Avi Kivity <a...@scylladb.com>:

>
> On 01/26/2017 07:36 AM, Benjamin Roth wrote:
>
> Hi there,
>
> We installed 2 new nodes these days. They run on ubuntu (Ubuntu 16.04.1
> LTS) with kernel 4.4.0-59-generic. On these nodes (and only on these) CS
> gets killed by the kernel due to OOM. It seems very strange to me because,
> CS only takes roughly 20GB (out of 128GB), most of RAM is allocated to page
> cache.
>
> Top looks typically like this:
> KiB Mem : 13191691+total,  1974964 free, 20278184 used, 10966376+buff/cache
> KiB Swap:0 total,0 free,0 used. 11051503+avail Mem
>
> This is what kern.log says:
> https://gist.github.com/brstgt/0f1aa6afb558a56d1cadce958db46cf9
>
> Has anyone encountered sth like this before?
>
>
> 2017-01-26T03:10:45.679458+00:00 cas10 kernel: [52226.449989] Node 0
> Normal: 33850*4kB (UMEH) 8*8kB (UMH) 1*16kB (H) 0*32kB 0*64kB 0*128kB
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 135480kB
> 2017-01-26T03:10:45.679460+00:00 cas10 kernel: [52226.449995] Node 1
> Normal: 34213*4kB (UME) 176*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 138260kB
>
>
> There is plenty of free memory left (33850+34213)*4kB = 270 MB, but it is
> fragmented into 4k and 8k blocks, while the kernel is trying to allocate
> 16kB.  Still, the kernel could have evicted some page cache or swapped out
> anonymous memory.  You should report this to lkml, it is a kernel bug.
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread Benjamin Roth

Ask for forgiveness not for permission if you do marketing ;)

Am 07.02.2017 13:11 schrieb "Kant Kodali" :

> lol. But seriously are they even allowed to say something that is not true
> about another product ?
>
> On Tue, Feb 7, 2017 at 4:05 AM, kurt greaves  wrote:
>
>> Marketing never lies. Ever
>>
>
>

Re: CS process killed by kernel OOM

2017-02-06 Thread Benjamin Roth

Alright. Thanks a lot for that information!

2017-02-06 14:35 GMT+01:00 Avi Kivity <a...@scylladb.com>:

> It is a bug.  In some contexts, the kernel needs to be able to reclaim
> memory instantly, but this is not one of them.  Here, the java process is
> creating a new thread, and the kernel is allocating 16kB for its kernel
> stack; that is a regular allocation, not atomic. If you decide the gfp_mask
> value you'll see that the kernel is allowed to initiate I/O and perform
> filesystem operations to satisfy the allocation, which it apparently did
> not.
>
>
> I do recommend reporting it, it will help others avoid encountering the
> same problem if it gets fixed.
>
> On 02/06/2017 03:07 PM, Benjamin Roth wrote:
>
> Thanks for the reply. We got rid of the OOMs by increasing
> vm.min_free_kbytes, it's default of approx 90mb is maybe a bit low for
> systems with 128GB.
> I guess the OOM happens because the kernel could not reclaim enough paged
> memory instantly.
> I can't tell if this is really a kernel bug or not. It also was my first
> thought but in the end the main thing is, it works again and it does with
> more mibn_free_kbytes
>
> 2017-02-06 11:53 GMT+01:00 Avi Kivity <a...@scylladb.com>:
>
>>
>> On 01/26/2017 07:36 AM, Benjamin Roth wrote:
>>
>> Hi there,
>>
>> We installed 2 new nodes these days. They run on ubuntu (Ubuntu 16.04.1
>> LTS) with kernel 4.4.0-59-generic. On these nodes (and only on these) CS
>> gets killed by the kernel due to OOM. It seems very strange to me because,
>> CS only takes roughly 20GB (out of 128GB), most of RAM is allocated to page
>> cache.
>>
>> Top looks typically like this:
>> KiB Mem : 13191691+total,  1974964 free, 20278184 used,
>> 10966376+buff/cache
>> KiB Swap:0 total,0 free,0 used. 11051503+avail Mem
>>
>> This is what kern.log says:
>> https://gist.github.com/brstgt/0f1aa6afb558a56d1cadce958db46cf9
>>
>> Has anyone encountered sth like this before?
>>
>>
>> 2017-01-26T03:10:45.679458+00:00 cas10 kernel: [52226.449989] Node 0
>> Normal: 33850*4kB (UMEH) 8*8kB (UMH) 1*16kB (H) 0*32kB 0*64kB 0*128kB
>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 135480kB
>> 2017-01-26T03:10:45.679460+00:00 cas10 kernel: [52226.449995] Node 1
>> Normal: 34213*4kB (UME) 176*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
>> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 138260kB
>>
>>
>> There is plenty of free memory left (33850+34213)*4kB = 270 MB, but it is
>> fragmented into 4k and 8k blocks, while the kernel is trying to allocate
>> 16kB.  Still, the kernel could have evicted some page cache or swapped out
>> anonymous memory.  You should report this to lkml, it is a kernel bug.
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Huge size of system.batches table after dropping an incomplete Materialized View

2017-01-22 Thread Benjamin Roth

I cannot tell you were these errors like "Attempting to mutate ..." come
from but under certain circumstances all view mutations are stored in
batches, so the batchlog can grow insanely large. I don't see why a repair
should help you in this situation. I guess what you want is to recreate the
table.

1. You should not repair MVs directly. The current design is to only
repairs the base table - though it's not properly documented. Repairing MVs
can create inconsistent states. Only repairing the base tables wont.
2. A repair does only repair data and won't fix schema-issues
3. A repair of a base table that contains an MV is incredibly slow if the
state is very inconsistent (which is probably the case in your situation)

What to do?
- If you don't care about the data of the MV, you of course can delete all
SSTables (when CS is stopped) and all data will be gone. But I don't know
if it helps.
- If you are 100% sure that no other batch logs are going on, you could
also truncate the system.batches, otherwise your log may be flooded with
"non-existant table" things if the batch log is replayed. It is annoying
but should not harm anyone.

=> Start over, try to drop and create the MV. Watch out for logs referring
to schema changes and errors

Side note:
I'd recommend not to use MVs (yet) if you don't have an "inside"
understanding of them or "know what you are doing". They can have a very
big impact on your cluster performance in some situations and are not
generally considered as stable yet.

2017-01-22 18:42 GMT+01:00 Vinci <vi...@protonmail.com>:

> Hi there,
>
> Version :- Cassandra 3.0.7
>
> I attempted to create a Materialized View on a certain table and it failed
> with never-ending WARN message "Mutation of  bytes is too large for
> the maximum size of ".
>
> "nodetool stop VIEW_BUILD" also did not help.
>
> That seems to be a result of https://issues.apache.org/
> jira/browse/CASSANDRA-11670 which is fixed in newer versions.
>
> So I tried dropping the view and that generated error messages like
> following :-
>
> ERROR [CompactionExecutor:632] [Timestamp] Keyspace.java:475 - Attempting
> to mutate non-existant table 7c2e1c40-b82b-11e6-9d20-4b0190661423
> (keyspace_name.view_name)
>
> I performed an incremental repair of the table on which view was created
> and a rolling restart to stop these errors.
>
> Now I see huge size of system.batches table on one of the nodes. It seems
> related to issues mentioned above since last modification timestamps of the
> sstable files inside system/batches is same as when I tried to drop the MV.
>
> Some insight and suggestions regarding it will be very helpful. I will
> like to know if i can safely truncate the table, rm the files or any other
> approach to clean it up?
>
> Thanks.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Benjamin Roth

The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe
1 node in DRPOCcluster also was temporarily unavailable during that query?

2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  88.88 GB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  73.95 GB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  66.42 GB   256  ?
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
> I am using below code to connect with java driver:
>
>
>
> cluster = Cluster.*builder*().addContactPoints(hostAddresses
> ).withRetryPolicy(DefaultRetryPolicy.*INSTANCE*)
>
>.withReconnectionPolicy(*new*
> ConstantReconnectionPolicy(3L))
>
>.withLoadBalancingPolicy(*new*
> TokenAwarePolicy(*new* DCAwareRoundRobinPolicy.Builder().withLocalDc("
> DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
>
> cluster.getConfiguration().getQueryOptions().setConsistencyLevel(
> ConsistencyLevel.LOCAL_QUORUM);
>
>
>
> hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is
> down, we are getting below exception :
>
>
>
>
>
> Exception in thread "main" 
> com.datastax.driver.core.exceptions.UnavailableException:
> Not enough replicas available for query at consistency QUORUM (3 required
> but only 2 alive)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:109)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:27)
>
>at com.datastax.driver.core.DriverThrowables.propagateCause(
> DriverThrowables.java:37)
>
>at com.datastax.driver.core.DefaultResultSetFuture.
> getUninterruptibly(DefaultResultSetFuture.java:245)
>
>
>
> Cassandra version : 3.0.9
>
> Datastax Java Driver Version:
>
>
>
> 
>
> com.datastax.cassandra
>
> cassandra-driver-
> core
>
> 3.1.2
>
> 
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
> We the soldiers of our new economy, pledge to stop doubting and start
> spending, to enable others to go digital, to use less cash. We pledge to
> #RemonetiseIndia. Join the Times Network ‘Remonetise India’ movement today.
> To pledge for growth, give a missed call on +91 9223515515
> <+91%2092235%2015515>. Visit www.remonetiseindia.com
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Benjamin Roth

Sorry for the short answer, I am on the run:
I guess your hints expired. Default setting is 3h. If a node is down for a
longertime, no hints will be written.
Only a repair will help then.

2017-01-23 12:47 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi Benjamin,
>
>
>
> I find the issue. while I was making query, I was overriding LOCAL_QUORUM
> to QUORUM.
>
>
>
> Also, one more Question,
>
>
>
> I was able insert data in DRPOCcluster. But when I bring up dc_india DC,
> data doesn’t seem in dc_india keyspace and column family (I wait near about
> 30 min)?
>
>
>
>
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Monday, January 23, 2017 5:05 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Getting Error while Writing in Multi DC mode when Remote
> Dc is Down.
>
>
>
> The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe
> 1 node in DRPOCcluster also was temporarily unavailable during that query?
>
>
>
> 2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari <Abhishek.Maheshwari@
> timesinternet.in>:
>
> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  88.88 GB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  73.95 GB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  66.42 GB   256  ?
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
> I am using below code to connect with java driver:
>
>
>
> cluster = Cluster.*builder*().addContactPoints(hostAddresses
> ).withRetryPolicy(DefaultRetryPolicy.*INSTANCE*)
>
>.withReconnectionPolicy(*new*
> ConstantReconnectionPolicy(3L))
>
>.withLoadBalancingPolicy(*new*
> TokenAwarePolicy(*new* DCAwareRoundRobinPolicy.Builder().withLocalDc("
> DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
>
> cluster.getConfiguration().getQueryOptions().setConsistencyLevel(
> ConsistencyLevel.LOCAL_QUORUM);
>
>
>
> hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is
> down, we are getting below exception :
>
>
>
>
>
> Exception in thread "main" 
> com.datastax.driver.core.exceptions.UnavailableException:
> Not enough replicas available for query at consistency QUORUM (3 required
> but only 2 alive)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:109)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:27)
>
>at com.datastax.driver.core.DriverThrowables.propagateCause(
> DriverThrowables.java:37)
>
>at com.datastax.driver.core.DefaultResultSetFuture.
> getUninterruptibly(DefaultResultSetFuture.java:245)
>
>
>
> Cassandra version : 3.0.9
>
> Datastax Java Driver Version:
>
>
>
> 
>
> com.datastax.cassandra
>
> cassandra-driver-
> core
>
> 3.1.2
>
> 
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> We the soldiers of

Re: Huge size of system.batches table after dropping an incomplete Materialized View

2017-01-23 Thread Benjamin Roth

What exactly persists? I didn't really understand you, could you be more
specific?

2017-01-23 15:40 GMT+01:00 Vinci <vi...@protonmail.com>:

> Thanks for the response.
>
> After the MV failure and errors, MV was dropped and the table was
> truncated.
> Then I recreated the MV and Table from scratch which worked as expected.
>
> The huge sizes of sstables as I have mentioned are after that. Somehow it
> still persists with same last modification timestamps.
>
> Not sure if i can safely rm these sstables or truncate system.batches on
> that node.
>
>
>  Original Message 
> Subject: Re: Huge size of system.batches table after dropping an
> incomplete Materialized View
> Local Time: 22 January 2017 11:41 PM
> UTC Time: 22 January 2017 18:11
> From: benjamin.r...@jaumo.com
> To: user@cassandra.apache.org, Vinci <vi...@protonmail.com>
>
> I cannot tell you were these errors like "Attempting to mutate ..." come
> from but under certain circumstances all view mutations are stored in
> batches, so the batchlog can grow insanely large. I don't see why a repair
> should help you in this situation. I guess what you want is to recreate the
> table.
>
> 1. You should not repair MVs directly. The current design is to only
> repairs the base table - though it's not properly documented. Repairing MVs
> can create inconsistent states. Only repairing the base tables wont.
> 2. A repair does only repair data and won't fix schema-issues
> 3. A repair of a base table that contains an MV is incredibly slow if the
> state is very inconsistent (which is probably the case in your situation)
>
> What to do?
> - If you don't care about the data of the MV, you of course can delete all
> SSTables (when CS is stopped) and all data will be gone. But I don't know
> if it helps.
> - If you are 100% sure that no other batch logs are going on, you could
> also truncate the system.batches, otherwise your log may be flooded with
> "non-existant table" things if the batch log is replayed. It is annoying
> but should not harm anyone.
>
> => Start over, try to drop and create the MV. Watch out for logs referring
> to schema changes and errors
>
> Side note:
> I'd recommend not to use MVs (yet) if you don't have an "inside"
> understanding of them or "know what you are doing". They can have a very
> big impact on your cluster performance in some situations and are not
> generally considered as stable yet.
>
> 2017-01-22 18:42 GMT+01:00 Vinci <vi...@protonmail.com>:
>
>> Hi there,
>>
>> Version :- Cassandra 3.0.7
>>
>> I attempted to create a Materialized View on a certain table and it
>> failed with never-ending WARN message "Mutation of  bytes is too
>> large for the maximum size of ".
>>
>> "nodetool stop VIEW_BUILD" also did not help.
>>
>> That seems to be a result of https://issues.apache.org/j
>> ira/browse/CASSANDRA-11670 which is fixed in newer versions.
>>
>> So I tried dropping the view and that generated error messages like
>> following :-
>>
>> ERROR [CompactionExecutor:632] [Timestamp] Keyspace.java:475 - Attempting
>> to mutate non-existant table 7c2e1c40-b82b-11e6-9d20-4b0190661423
>> (keyspace_name.view_name)
>>
>> I performed an incremental repair of the table on which view was created
>> and a rolling restart to stop these errors.
>>
>> Now I see huge size of system.batches table on one of the nodes. It seems
>> related to issues mentioned above since last modification timestamps of the
>> sstable files inside system/batches is same as when I tried to drop the MV.
>>
>> Some insight and suggestions regarding it will be very helpful. I will
>> like to know if i can safely truncate the table, rm the files or any other
>> approach to clean it up?
>>
>> Thanks.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-24 Thread Benjamin Roth

Have you also altered RF of system_distributed as stated in the tutorial?

2017-01-24 16:45 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> My Mistake,
>
>
>
> Both clusters are up and running.
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.XX.XX  1.65 GB   256  ?   
> badf985b-37da-4735-b468-8d3a058d4b60
> 01
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 317061b2-c19f-44ba-a776-bcd91c70bbdd
> 03
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c
> 02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.26.XX.XX   79.90 GB   256  ?   
> 3e8133ed-98b5-418d-96b5-690a1450cd30
> RACK1
>
> UN  172.26.XX.XX   80.21 GB   256  ?   
> 7d3f5b25-88f9-4be7-b0f5-746619153543
> RACK2
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Tuesday, January 24, 2017 9:11 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
> new Cluster
>
>
>
> I am not an expert in bootstrapping new DCs but shouldn't the OLD nodes
> appear as UP to be used as a streaming source in rebuild?
>
>
>
> 2017-01-24 16:32 GMT+01:00 Abhishek Kumar Maheshwari <Abhishek.Maheshwari@
> timesinternet.in>:
>
> Yes, I take all steps. While I am inserting new data is replicating on
> both DC. But only old data is not replication in new cluster.
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Tuesday, January 24, 2017 8:55 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
> new Cluster
>
>
>
> There is much more to it than just changing the RF in the keyspace!
>
>
>
> See here: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/
> opsAddDCToCluster.html
>
>
>
> 2017-01-24 16:18 GMT+01:00 Abhishek Kumar Maheshwari <Abhishek.Maheshwari@
> timesinternet.in>:
>
> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  256  MB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  240 MB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  240 MB   256  ?   
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1
> 02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
> dc_india is old Dc which contains all data.
>
> I update keyspace as per below:
>
>
>
> alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy',
> 'DRPOCcluster': '2','dc_india':'2'}  AND durable_writes = true;
>
>
>
> but old data is not updating in DRPOCcluster(which is new). Also, while
> running nodetool rebuild getting below exception:
>
> Cammand: ./nodetool rebuild -dc dc_india
>
>
>
> Exception : nodetool: U

Re: Time series data model and tombstones

2017-01-28 Thread Benjamin Roth

Maybe trace your queries to see what's happening in detail.

Am 28.01.2017 21:32 schrieb "John Sanda" :

Thanks for the response. This version of the code is using STCS.
gc_grace_seconds was set to one day and then I changed it to zero since RF
= 1. I understand that expired data will still generate tombstones and that
STCS is not the best. More recent versions of the code use DTCS, and we'll
be switching over to TWCS shortly. The suggestions raised are excellent
ones, but I tend to think of them as optimizations that might not address
my issue which I think may be 1) a problem with my data model, 2) problem
with the queries used or 3) some misunderstanding of Cassandra performs
range scans.

I am doing append-only writes. There is no out of order data. There are no
deletes, just TTLs. Data is stored on disk in descending order, and queries
access recent data and never query past the TTL of seven days. Given this I
would not except to be reading tombstones, certainly not the large numbers
that I am seeing.

On Sat, Jan 28, 2017 at 12:15 PM, Jonathan Haddad  wrote:

> Since you didn't specify a compaction strategy I'm guessing you're using
> STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy
> for this type of workload.
> On Sat, Jan 28, 2017 at 8:30 AM John Sanda  wrote:
>
>> I have a time series data model that is basically:
>>
>> CREATE TABLE metrics (
>> id text,
>> time timeuuid,
>> value double,
>> PRIMARY KEY (id, time)
>> ) WITH CLUSTERING ORDER BY (time DESC);
>>
>> I do append-only writes, no deletes, and use a TTL of seven days. Data
>> points are written every seconds. The UI queries data for the past hour,
>> two hours, day, or week. The UI refreshes and executes queries every 30
>> seconds. In one test environment I am seeing lots of tombstone threshold
>> warnings and Cassandra has even OOME'd. Since I am storing data in
>> descending order and always query for recent data, I do not understand why
>> I am running into this problem.
>>
>> I know that it is recommended to do some date partitioning in part to
>> ensure partitions do not grow too large. I already have some changes in
>> place to partition by day.. Before I make those changes I want to
>> understand why I am scanning so many tombstones so that I can be more
>> confident that the date partitioning changes will help.
>>
>> Thanks
>>
>> - John
>>
>

-- 

- John

CS process killed by kernel OOM

2017-01-25 Thread Benjamin Roth

Hi there,

We installed 2 new nodes these days. They run on ubuntu (Ubuntu 16.04.1
LTS) with kernel 4.4.0-59-generic. On these nodes (and only on these) CS
gets killed by the kernel due to OOM. It seems very strange to me because,
CS only takes roughly 20GB (out of 128GB), most of RAM is allocated to page
cache.

Top looks typically like this:
KiB Mem : 13191691+total,  1974964 free, 20278184 used, 10966376+buff/cache
KiB Swap:0 total,0 free,0 used. 11051503+avail Mem

This is what kern.log says:
https://gist.github.com/brstgt/0f1aa6afb558a56d1cadce958db46cf9

Has anyone encountered sth like this before?

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Disc size for cluster

2017-01-26 Thread Benjamin Roth

Hi!

This is basically right, but:
1. How do you know the 3TB storage will be 3TB on cassandra? This depends
how the data is serialized, compressed and how often it changes and it
depends on your compaction settings
2. 50% free space on STCS is only required if you do a full compaction of a
single CF that takes all the space. Normally you need as much free space as
the target SSTable of a compaction will take. If you split your data across
more CFs, its unlikely you really hit this value.

.. probably you should do some tests. But in the end it is always good to
have some headroom. I personally would scale out if free space is < 30% but
that always depends on your model.


2017-01-26 9:56 GMT+01:00 Raphael Vogel <raphael.vo...@web.de>:

> Hi
> Just want to validate my estimation for a C* cluster which should have
> around 3 TB of usable storage.
> Assuming a RF of 3 and SizeTiered Compaction Strategy.
> Is it correct, that SizeTiered Compaction Strategy needs (in the worst
> case) 50% free disc space during compaction?
>
> So this would then result in a cluster of 3TB x 3 x 2 == 18 TB of raw
> storage?
>
> Thanks and Regards
> Raphael Vogel
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Does C* coordinator writes to replicas in same order or different order?

2017-02-21 Thread Benjamin Roth

For eventual consistency, it does not matter if it is sync or async. LWW
always works as long as clocks are synchronized.
Thats a design pattern of CS or EC databases in general. Every write has a
timestamp and no matter at what time it arrives, the last write will win
even if a "sooner" write arrives late due to network latency oder a
unavailable server that receives a hint after 1 hour.
Doing replication sync will kill all the benefits you have from CS's design:
- low latency
- partition tolerance
- high availability

Doing sync replication would also not guarantee a state as another client
could "interfer" with your write. So you still have no "linearizability".
Only LWT does this.
You cannot rely on orders in CS. No matter how replication works. You only
can rely "eventually" on it but there is never a point in time you can tell
100% your system is completely consistent.

Maybe what you could do if you are talking of "orders" and that pointer
thing you mentioned earlier: Try sth similar like MVs do.
Create a trigger, operate on your local dataset, read the order based on PK
(locally) and update "the pointer" on every write (also locally). If you
then store your pointer with the last known timestamp of your base data,
you also have a LWW on your pointer so also the last pointer wins when
reading with > CL_ONE.
But that will probably harm your write performance.

2017-02-21 10:36 GMT+01:00 Kant Kodali <k...@peernova.com>:

> @Benjamin I am more looking for how C* replication works underneath. There
> are few things here that I would need some clarification.
>
> 1. Does C* uses sync replication or async replication? If it is async
> replication how can one get performance especially when there is an
> ordering constraint among requests to comply with LWW.  Also below is a
> statement from C* website so how can one choose between sync or async
> replication? any configuration parameter that needs to be passed in?
>
> "Choose between synchronous or asynchronous replication for each update."
>
> http://cassandra.apache.org/
>
> 2. Is it Guaranteed that C* coordinator writes data in the same order to
> all the replicas (either sync or async)?
>
> Thanks,
> kant
>
> On Tue, Feb 21, 2017 at 1:23 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> To me that sounds like a completely different design pattern and a
>> different use case.
>> CS was not designed to guarantee order. It was build to be linear
>> scalable, highly concurrent and eventual consistent.
>> To me it sounds like a ACID DB better serves what you are asking for.
>>
>> 2017-02-21 10:17 GMT+01:00 Kant Kodali <k...@peernova.com>:
>>
>>> Agreed that async performs better than sync in general but the catch
>>> here to me is the "order".
>>>
>>> The whole point of async is to do out of order processing by which I
>>> mean say if a request 1 comes in at time t1 and a request 2 comes in at
>>> time t2 where t1 < t2 and say now that t1 is taking longer to process than
>>> t2 in which case request 2 should get a response first and subsequently a
>>> response for request 1. This is where I would imagine all the benefits of
>>> async come in but the moment you introduce order by saying for Last Write
>>> Wins all the async requests should be processed in order I would imagine
>>> all the benefits of async are lost.
>>>
>>> Let's see if anyone can comment about how it works inside C*.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Mon, Feb 20, 2017 at 10:54 PM, Dor Laor <d...@scylladb.com> wrote:
>>>
>>>> Could be. Let's stay tuned to see if someone else pick it up.
>>>> Anyway, if it's synchronous, you'll have a large penalty for latency.
>>>>
>>>> On Mon, Feb 20, 2017 at 10:11 PM, Kant Kodali <k...@peernova.com>
>>>> wrote:
>>>>
>>>>> Thanks again for the response! if they mean it between client and
>>>>> server I am not sure why they would use the word "replication" in the
>>>>> statement below since there is no replication between client and server(
>>>>> coordinator).
>>>>>
>>>>> "Choose between synchronous or asynchronous replication for each
>>>>>> update."
>>>>>>
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Feb 20, 2017, at 5:30 PM, Dor Laor <d...@scylladb.com> wrote:
>>>>>
>>>>> I think they mean the client to server and not among the

Re: Does C* coordinator writes to replicas in same order or different order?

2017-02-21 Thread Benjamin Roth

;> timestamp right?  What I am really looking for is that if I send write
>>>>> request concurrently for record 1 and record 2 are they guaranteed to be
>>>>> inserted in the same order across replicas? (Whatever order coordinator 
>>>>> may
>>>>> choose is fine but I want the same order across all replicas and with 
>>>>> async
>>>>> replication I am not sure how that is possible ? for example,  if a 
>>>>> request
>>>>> arrives with timestamp t1 and another request arrives with a timestamp t2
>>>>> where t1 < t2...with async replication what if one replica chooses to
>>>>> execute t2 first and then t1 simply because t1 is slow while another
>>>>> replica choose to execute t1 first and then t2..how would that work?  )*
>>>>>
>>>>>>
>>>>>> Note that C* each node can be a coordinator (one per request) and its
>>>>>> the desired case in order to load balance the incoming requests. Once
>>>>>> again,
>>>>>> timestamps determine the order among the requests.
>>>>>>
>>>>>> Cheers,
>>>>>> Dor
>>>>>>
>>>>>> On Mon, Feb 20, 2017 at 4:12 PM, Kant Kodali <k...@peernova.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> when C* coordinator writes to replicas does it write it in same
>>>>>>> order or
>>>>>>> different order? other words, Does the replication happen
>>>>>>> synchronously or
>>>>>>> asynchrnoulsy ? Also does this depend sync or async client? What
>>>>>>> happens in
>>>>>>> the case of concurrent writes to a coordinator ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> kant
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-15 Thread Benjamin Roth

Erm sorry, forgot to mention. In this case "cas10" is Node A with 512
tokens and "cas9" Node B with 256 tokens.

2017-02-16 6:38 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:

> It doesn't really look like that:
> https://cl.ly/2c3Z1u2k0u2I
>
> Thats the ReadLatency.count metric aggregated by host which represents the
> actual read operations, correct?
>
> 2017-02-15 23:01 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
>
>> I think it has more than double the load. It is double the data. More
>> read repair chances. More load can swing it's way during node failures etc.
>>
>> On Wednesday, February 15, 2017, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> Hi there,
>>>
>>> Following situation in cluster with 10 nodes:
>>> Node A's disk read IO is ~20 times higher than the read load of node B.
>>> The nodes are exactly the same except:
>>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>>
>>> Node A has roughly 460GB, Node B 260GB total disk usage.
>>> Both nodes have 128GB RAM and 40 cores.
>>>
>>> Of course I assumed that Node A does more reads because cache / load
>>> ratio is worse but a factor of 20 makes me very sceptic.
>>>
>>> Of course Node A has a much higher and less predictable latency due to
>>> the wait states.
>>>
>>> Has anybody experienced similar situations?
>>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>>> payload is not that few. I am pretty sure that not the whole dataset of
>>> 460GB is "hot".
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-15 Thread Benjamin Roth

It doesn't really look like that:
https://cl.ly/2c3Z1u2k0u2I

Thats the ReadLatency.count metric aggregated by host which represents the
actual read operations, correct?

2017-02-15 23:01 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:

> I think it has more than double the load. It is double the data. More read
> repair chances. More load can swing it's way during node failures etc.
>
> On Wednesday, February 15, 2017, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Hi there,
>>
>> Following situation in cluster with 10 nodes:
>> Node A's disk read IO is ~20 times higher than the read load of node B.
>> The nodes are exactly the same except:
>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>
>> Node A has roughly 460GB, Node B 260GB total disk usage.
>> Both nodes have 128GB RAM and 40 cores.
>>
>> Of course I assumed that Node A does more reads because cache / load
>> ratio is worse but a factor of 20 makes me very sceptic.
>>
>> Of course Node A has a much higher and less predictable latency due to
>> the wait states.
>>
>> Has anybody experienced similar situations?
>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>> payload is not that few. I am pretty sure that not the whole dataset of
>> 460GB is "hot".
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Cassandra blob vs base64 text

2017-02-20 Thread Benjamin Roth

You could save space when storing your data (base64-)decoded as blobs.

2017-02-20 13:38 GMT+01:00 Oskar Kjellin :

> We currently have some cases where we store base64 as a text field instead
> of a blob (running version 2.0.17).
> I would like to move these to blob but wondering what benefits and
> optimizations there are? The possible ones I can think of is (but there's
> probably more):
>
> * blob is stored as off heap ByteBuffers?
> * blob won't be decompressed server side?
>
> Are there any other reasons to switch to blobs? Or are we not going to see
> any difference?
>
> Thanks!
>

Re: Count(*) is not working

2017-02-20 Thread Benjamin Roth

+1 I also encountered timeouts many many times (using DS DevCenter).
Roughly this occured when count(*) > 1.000.000

2017-02-20 14:42 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:

> Seems worth it to file a bug since some here are under the impression it
> almost always works and others are under the impression it almost never
> works.
>
> On Friday, February 17, 2017, kurt greaves <k...@instaclustr.com> wrote:
>
>> really... well that's good to know. it still almost never works though. i
>> guess every time I've seen it it must have timed out due to tombstones.
>>
>> On 17 Feb. 2017 22:06, "Sylvain Lebresne" <sylv...@datastax.com> wrote:
>>
>> On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves <k...@instaclustr.com>
>> wrote:
>>
>>> if you want a reliable count, you should use spark. performing a count
>>> (*) will inevitably fail unless you make your server read timeouts and
>>> tombstone fail thresholds ridiculous
>>>
>>
>> That's just not true. count(*) is paged internally so while it is not
>> particular fast, it shouldn't require bumping neither the read timeout nor
>> the tombstone fail threshold in any way to work.
>>
>> In that case, it seems the partition does have many tombstones (more than
>> live rows) and so the tombstone threshold is doing its job of warning about
>> it.
>>
>>
>>>
>>> On 17 Feb. 2017 04:34, "Jan" <j...@dafuer.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> could you post the output of nodetool cfstats for the table?
>>>>
>>>> Cheers,
>>>>
>>>> Jan
>>>>
>>>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>>>
>>>> I am not getting count as result. Where i keep on getting n number of
>>>> results below.
>>>>
>>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>>> LIMIT 100 (see tombstone_warn_threshold)
>>>>
>>>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten <j...@dafuer.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> do you got a result finally?
>>>>>
>>>>> Those messages are simply warnings telling you that c* had to read
>>>>> many tombstones while processing your query - rows that are deleted but 
>>>>> not
>>>>> garbage collected/compacted. This warning gives you some explanation why
>>>>> things might be much slower than expected because per 100 rows that count
>>>>> c* had to read about 15 times rows that were deleted already.
>>>>>
>>>>> Apart from that, count(*) is almost always slow - and there is a
>>>>> default limit of 10.000 rows in a result.
>>>>>
>>>>> Do you really need the actual live count? To get a idea you can always
>>>>> look at nodetool cfstats (but those numbers also contain deleted rows).
>>>>>
>>>>>
>>>>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I want to know the total records count in table.
>>>>>
>>>>> I fired the below query:
>>>>>select count(*) from tablename;
>>>>>
>>>>> and i have got the below output
>>>>>
>>>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>>>> LIMIT 100 (see tombstone_warn_threshold)
>>>>>
>>>>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>>>>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>>>>> tombstone_warn_threshold)
>>>>>
>>>>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>>>>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>>>>> tombstone_warn_threshold).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Can you please help me to get the total count of the table.
>>>>>
>>>>> --
>>>>> Selvam Raman
>>>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Selvam Raman
>>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>>
>>>>
>>>>
>>
>>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-20 Thread Benjamin Roth

Hah! Found the problem!

After setting read_ahead to 0 and compression chunk size to 4kb on all CFs,
the situation was PERFECT (nearly, please see below)! I scrubbed some CFs
but not the whole dataset, yet. I knew it was not too few RAM.

Some stats:
- Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
- Disk throughput: https://cl.ly/2a0Z250S1M3c
- Dstat: https://gist.github.com/brstgt/c92bbd46ab76283e534b853b88ad3b26
- This shows, that the request distribution remained the same, so no
dyn-snitch magic: https://cl.ly/3E0t1T1z2c0J

Btw. I stumbled across this one:
https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
Maybe we should also think about lowering default chunk length.

*Unfortunately schema changes had a disturbing effect:*
- I changed the chunk size with a script, so there were a lot of schema
changes in a small period.
- After all tables were changed, one of the seed hosts (cas1) went TOTALLY
crazy.
- Latency on this host was 10x of all other hosts.
- There were more ParNew GCs.
- Load was very high (up to 80, 100% CPU)
- Whole system was unstable due to unpredictable latencies and
backpressures (https://cl.ly/1m022g2W1Q3d)
- Even SELECT * FROM system_schema.table etc appeared as slow query in the
logs
- It was the 1st server in the connect host list for the PHP client
- CS restart didn't help. Reboot did not help (cold page cache made it
probably worse).
- All other nodes were totally ok.
- Stopping CS on cas1 helped to keep the system stable. Brought down
latency again, but was no solution.

=> Only replacing the node (with a newer, faster node) in the connect-host
list helped that situation.

Any ideas why changing schemas and/or chunk size could have such an effect?
For some time the situation was really critical.


2017-02-20 10:48 GMT+01:00 Bhuvan Rawal :

> Hi Benjamin,
>
> Yes, Read ahead of 8 would imply more IO count from disk but it should not
> cause more data read off the disk as is happening in your case.
>
> One probable reason for high disk io would be because the 512 vnode has
> less page to RAM ratio of 22% (100G buff /437G data) as compared to 46%
> (100G/237G). And as your avg record size is in bytes for every disk io you
> are fetching complete 64K block to get a row.
>
> Perhaps you can balance the node by adding equivalent RAM ?
>
> Regards,
> Bhuvan
>

Re: High disk io read load

2017-02-18 Thread Benjamin Roth

Just for the record, that's what dstat looks like while CS is starting:

root@cas10:~# dstat -lrnv 10
---load-avg--- --io/total- -net/total- ---procs--- --memory-usage-
---paging-- -dsk/total- ---system-- total-cpu-usage
 1m   5m  15m | read  writ| recv  send|run blk new| used  buff  cach  free|
 in   out | read  writ| int   csw |usr sys idl wai hiq siq
0.69 0.18 0.06| 228  24.3 |   0 0 |0.0   0  24|17.8G 3204k  458M  108G|
  0 0 |5257k  417k|  17k 3319 |  2   1  97   0   0   0
0.96 0.26 0.09| 591  27.9 | 522k  476k|4.1   0  69|18.3G 3204k  906M  107G|
  0 0 |  45M  287k|  22k 6943 |  7   1  92   0   0   0
13.2 2.83 0.92|2187  28.7 |1311k  839k|5.3  90  18|18.9G 3204k 9008M 98.1G|
  0 0 | 791M 8346k|  49k   25k| 17   1  36  46   0   0
30.6 6.91 2.27|2188  67.0 |4200k 3610k|8.8 106  27|19.5G 3204k 17.9G 88.4G|
  0 0 | 927M 8396k| 116k  119k| 24   2  17  57   0   0
43.6 10.5 3.49|2136  24.3 |4371k 3708k|6.3 108 1.0|19.5G 3204k 26.7G 79.6G|
  0 0 | 893M   13M| 117k  159k| 15   1  17  66   0   0
56.9 14.4 4.84|2152  32.5 |3937k 3767k| 11  83 5.0|19.5G 3204k 35.5G 70.7G|
  0 0 | 894M   14M| 126k  160k| 16   1  16  65   0   0
63.2 17.1 5.83|2135  44.1 |4601k 4185k|6.9  99  35|19.6G 3204k 44.3G 61.9G|
  0 0 | 879M   15M| 133k  168k| 19   2  19  60   0   0
64.6 18.9 6.54|2174  42.2 |4393k 3522k|8.4  93 2.2|20.0G 3204k 52.7G 53.0G|
  0 0 | 897M   14M| 138k  160k| 14   2  15  69   0   0

The IO shoots up (791M) as soon as CS has started up and accepts requests.
I also diffed sysctl of the both machines. No significant differences. Only
CPU-related, random values and some hashes differ.

2017-02-18 21:49 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:

> 256 tokens:
>
> root@cas9:/sys/block/dm-0# blockdev --report
> RORA   SSZ   BSZ   StartSecSize   Device
> rw   256   512  4096  067108864   /dev/ram0
> rw   256   512  4096  067108864   /dev/ram1
> rw   256   512  4096  067108864   /dev/ram2
> rw   256   512  4096  067108864   /dev/ram3
> rw   256   512  4096  067108864   /dev/ram4
> rw   256   512  4096  067108864   /dev/ram5
> rw   256   512  4096  067108864   /dev/ram6
> rw   256   512  4096  067108864   /dev/ram7
> rw   256   512  4096  067108864   /dev/ram8
> rw   256   512  4096  067108864   /dev/ram9
> rw   256   512  4096  067108864   /dev/ram10
> rw   256   512  4096  067108864   /dev/ram11
> rw   256   512  4096  067108864   /dev/ram12
> rw   256   512  4096  067108864   /dev/ram13
> rw   256   512  4096  067108864   /dev/ram14
> rw   256   512  4096  067108864   /dev/ram15
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sda
> rw16   512  4096   2048800164151296   /dev/sda1
> rw16   512  4096  0644245094400 <06442%2045094400>
> /dev/dm-0
> rw16   512  4096  0  2046820352   /dev/dm-1
> rw16   512  4096  0  1023410176   /dev/dm-2
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sdb
>
> 512 tokens:
> root@cas10:/sys/block# blockdev --report
> RORA   SSZ   BSZ   StartSecSize   Device
> rw   256   512  4096  067108864   /dev/ram0
> rw   256   512  4096  067108864   /dev/ram1
> rw   256   512  4096  067108864   /dev/ram2
> rw   256   512  4096  067108864   /dev/ram3
> rw   256   512  4096  067108864   /dev/ram4
> rw   256   512  4096  067108864   /dev/ram5
> rw   256   512  4096  067108864   /dev/ram6
> rw   256   512  4096  067108864   /dev/ram7
> rw   256   512  4096  067108864   /dev/ram8
> rw   256   512  4096  067108864   /dev/ram9
> rw   256   512  4096  067108864   /dev/ram10
> rw   256   512  4096  067108864   /dev/ram11
> rw   256   512  4096  067108864   /dev/ram12
> rw   256   512  4096  067108864   /dev/ram13
> rw   256   512  4096  067108864   /dev/ram14
> rw   256   512  4096  067108864   /dev/ram15
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sda
> rw16   512  4096   2048800164151296   /dev/sda1
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sdb
> rw16   512  4096   2048800165027840   /dev/sdb1
> rw16   512  4096  0   1073741824000   /dev/dm-0
> rw16   512  4096  0  2046820352   /dev/

Re: High disk io read load

2017-02-19 Thread Benjamin Roth

This is the output of sar:
https://gist.github.com/anonymous/9545fb69fbb28a20dc99b2ea5e14f4cd
<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fanonymous%2F9545fb69fbb28a20dc99b2ea5e14f4cd=D=1=AFQjCNH6r_GCSN0ZxmDx1f8xGRJPweV-EQ>

It seems to me that there es not enough page cache to handle all data in a
reasonable way.
As pointed out yesterday, the read rate with empty page cache is ~800MB/s.
Thats really (!!!) much for 4-5MB/s network output.

I stumbled across the compression chunk size, which I always left untouched
from the default of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess setting a
read ahead of 8kb is totally pointless if CS reads 64kb if it only has to
fetch a single row, right? Are there recommendations for that setting?

2017-02-19 19:15 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:

> Hi Edward,
>
> This could have been a valid case here but if hotspots indeed existed then
> along with really high disk io , the node should have been doing
> proportionate high network io as well. -  higher queries per second as well.
>
> But from the output shared by Benjamin that doesnt appear to be the case
> and things look balanced.
>
> Regards,
>
> On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>>
>> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> We are talking about a read IO increase of over 2000% with 512 tokens
>>> compared to 256 tokens. 100% increase would be linear which would be
>>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>>> account. But > 20x the read IO is really incredible.
>>> The nodes are configured with puppet, they share the same roles and no
>>> manual "optimizations" are applied. So I can't imagine, a different
>>> configuration is responsible for it.
>>>
>>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>>>
>>>> This is status of the largest KS of these both nodes:
>>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>
>>>> So roughly as expected.
>>>>
>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>
>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>> <07161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>> When I read articles like this:
>>
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> And see the word hot-spot.
>>
>> "Another performance consideration worth mentioning is hot-spot. Similar
>> to manual denormalization, if your view partition key is chosen poorly,
>> you’ll end up with hot spots in your cluster. A simple example with our
>> *user* table is to create a materialized
>>
>> *view user_by_gender"It leads me to ask a question back: What can you say
>> about hotspots in your data? Even if your nodes had the identical number of
>> tokens this autho seems to suggesting that you still could have hotspots.
>> Maybe the issue is you have a hotspot 2x hotspots, or your application has
>> a hotspot that would be present even with perfect token balancing.*
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-24 Thread Benjamin Roth

It was only the schema change.

2017-02-24 19:18 GMT+01:00 kurt greaves <k...@instaclustr.com>:

> How many CFs are we talking about here? Also, did the script also kick off
> the scrubs or was this purely from changing the schemas?
> 
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-18 Thread Benjamin Roth

This is status of the largest KS of these both nodes:
UN  10.23.71.10  437.91 GiB  512  49.1%
2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
UN  10.23.71.9   246.99 GiB  256  28.3%
2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1

So roughly as expected.

2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:

> what's the Owns % for the relevant keyspace from nodetool status?
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-18 Thread Benjamin Roth

We are talking about a read IO increase of over 2000% with 512 tokens
compared to 256 tokens. 100% increase would be linear which would be
perfect. 200% would even okay, taking the RAM/Load ratio for caching into
account. But > 20x the read IO is really incredible.
The nodes are configured with puppet, they share the same roles and no
manual "optimizations" are applied. So I can't imagine, a different
configuration is responsible for it.

2017-02-18 21:28 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:

> This is status of the largest KS of these both nodes:
> UN  10.23.71.10  437.91 GiB  512  49.1%
> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
> UN  10.23.71.9   246.99 GiB  256  28.3%
> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>
> So roughly as expected.
>
> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>
>> what's the Owns % for the relevant keyspace from nodetool status?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-18 Thread Benjamin Roth

cat /sys/block/sda/queue/read_ahead_kb
=> 8

On all CS nodes. Is that what you mean?

2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:

> Hi Benjamin,
>
> What is the disk read ahead on both nodes?
>
> Regards,
> Bhuvan
>
> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> This is status of the largest KS of these both nodes:
>> UN  10.23.71.10  437.91 GiB  512  49.1%
>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>> UN  10.23.71.9   246.99 GiB  256  28.3%
>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>
>> So roughly as expected.
>>
>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>
>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-18 Thread Benjamin Roth

256 tokens:

root@cas9:/sys/block/dm-0# blockdev --report
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  067108864   /dev/ram0
rw   256   512  4096  067108864   /dev/ram1
rw   256   512  4096  067108864   /dev/ram2
rw   256   512  4096  067108864   /dev/ram3
rw   256   512  4096  067108864   /dev/ram4
rw   256   512  4096  067108864   /dev/ram5
rw   256   512  4096  067108864   /dev/ram6
rw   256   512  4096  067108864   /dev/ram7
rw   256   512  4096  067108864   /dev/ram8
rw   256   512  4096  067108864   /dev/ram9
rw   256   512  4096  067108864   /dev/ram10
rw   256   512  4096  067108864   /dev/ram11
rw   256   512  4096  067108864   /dev/ram12
rw   256   512  4096  067108864   /dev/ram13
rw   256   512  4096  067108864   /dev/ram14
rw   256   512  4096  067108864   /dev/ram15
rw16   512  4096  0800166076416   /dev/sda
rw16   512  4096   2048800164151296   /dev/sda1
rw16   512  4096  0644245094400   /dev/dm-0
rw16   512  4096  0  2046820352   /dev/dm-1
rw16   512  4096  0  1023410176   /dev/dm-2
rw16   512  4096  0800166076416   /dev/sdb

512 tokens:
root@cas10:/sys/block# blockdev --report
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  067108864   /dev/ram0
rw   256   512  4096  067108864   /dev/ram1
rw   256   512  4096  067108864   /dev/ram2
rw   256   512  4096  067108864   /dev/ram3
rw   256   512  4096  067108864   /dev/ram4
rw   256   512  4096  067108864   /dev/ram5
rw   256   512  4096  067108864   /dev/ram6
rw   256   512  4096  067108864   /dev/ram7
rw   256   512  4096  067108864   /dev/ram8
rw   256   512  4096  067108864   /dev/ram9
rw   256   512  4096  067108864   /dev/ram10
rw   256   512  4096  067108864   /dev/ram11
rw   256   512  4096  067108864   /dev/ram12
rw   256   512  4096  067108864   /dev/ram13
rw   256   512  4096  067108864   /dev/ram14
rw   256   512  4096  067108864   /dev/ram15
rw16   512  4096  0800166076416   /dev/sda
rw16   512  4096   2048800164151296   /dev/sda1
rw16   512  4096  0800166076416   /dev/sdb
rw16   512  4096   2048800165027840   /dev/sdb1
rw16   512  4096  0   1073741824000   /dev/dm-0
rw16   512  4096  0  2046820352   /dev/dm-1
rw16   512  4096  0  1023410176   /dev/dm-2

2017-02-18 21:41 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:

> Hi Ben,
>
> If its same on both machines then something else could be the issue. We
> faced high disk io due to misconfigured read ahead which resulted in high
> amount of disk io for comparatively insignificant network transfer.
>
> Can you post output of blockdev --report for a normal node and 512 token
> node.
>
> Regards,
>
> On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> cat /sys/block/sda/queue/read_ahead_kb
>> => 8
>>
>> On all CS nodes. Is that what you mean?
>>
>> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>>
>>> Hi Benjamin,
>>>
>>> What is the disk read ahead on both nodes?
>>>
>>> Regards,
>>> Bhuvan
>>>
>>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> This is status of the largest KS of these both nodes:
>>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>
>>>> So roughly as expected.
>>>>
>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>
>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>> <07161%203048801>
>>>> AG Ulm ·

DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

Hi Guys,

CQL says this is not allowed:

DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));

1. Is there a reason for it? There shouldn't be a performance penalty, it
is a PK lookup, the same thing works with a single pk column
2. Is there a known workaround for it?

It would be much of a help to have it for daily business, IMHO it's a waste
of resources to run multiple queries just to fetch a bunch of records by a
PK.

Thanks in advance for any reply

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

This doesn't really belong to this topic but I also experienced what Ben
says.
I was migrating (and still am) tons of data from MySQL to CS. I measured
several approached (async parallel, prepared stmt, sync with unlogged
batches) and it turned out that batches where really fast and produced less
problems with cluster overloading with MVs.

2017-02-09 11:28 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:

> That’s a very good point from Sylvain that I forgot/missed. That said,
> we’ve seen plenty of scenarios where overall system throughput is improved
> through unlogged batches. One of my colleagues did quite a bit of
> benchmarking on this topic for his talk at last year’s C* summit:
> http://www.slideshare.net/DataStax/microbatching-
> highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016
>
> On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
>> Ok got it.
>>
>> But it's interesting that this is supported:
>> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>>
>> This is technically mostly the same (Token awareness,
>> coordination/routing, read performance, ...), right?
>>
>> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:
>>
>> This is a statement on multiple partitions and there is really no
>> optimization the code internally does on that. In fact, I strongly advise
>> you to not use a batch but rather simply do a for loop client side and send
>> statement individually. That way, your driver will be able to use proper
>> token-awareness for each request (while if you send a batch, one
>> coordinator will be picked up and will have to forward most statement,
>> doing more network hops at the end of the day). The only case where using a
>> batch is indeed legit is if you care about all the statement being atomic,
>> but in that case it's a logged batch you want.
>>
>> That's btw more or less why we never bothered implementing that: it's
>> totally doable technically, but it's not really such a good idea
>> performance wise in practice most of the time, and you can easily work it
>> around with a batch if you need atomicity.
>>
>> Which is not saying it will never be and shouldn't be supported btw,
>> there is something to be said for the consistency of the CQL language in
>> general. But it's why no-one took time to do it so far.
>>
>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Yes, thats the workaround - I'll try that.
>>
>> Would you agree it would be better for internal optimizations to process
>> this within a single statement?
>>
>> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>> Yep, that makes it clear. I think an unlogged batch of prepared
>> statements with one statement per PK tuple would be roughly equivalent? And
>> probably no more complex to generate in the client?
>>
>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>> 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> ————
>> Ben Slater
>> Chief Pro

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

Yes, thats the workaround - I'll try that.

Would you agree it would be better for internal optimizations to process
this within a single statement?

2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:

> Yep, that makes it clear. I think an unlogged batch of prepared statements
> with one statement per PK tuple would be roughly equivalent? And probably
> no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
>> Maybe that makes it clear:
>>
>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>> 3), (2, 3), (3, 4));
>>
>> If want to delete or select a bunch of records identified by their
>> multi-partitionkey tuples.
>>
>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>
>> Cheers
>> Ben
>>
>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

Ok now I REALLY got it :)
Thanks Sylvain!

2017-02-09 11:42 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:

> On Thu, Feb 9, 2017 at 10:52 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Ok got it.
>>
>> But it's interesting that this is supported:
>> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>>
>> This is technically mostly the same (Token awareness,
>> coordination/routing, read performance, ...), right?
>>
>
> It is. That's what I meant by "there is something to be said for the
> consistency of the CQL language in general". In other words, look for no
> externally logical reason for this being unsupported, it's unsupported
> simply due to how the CQL code evolved. But as I said, we didn't fix that
> inconsistency because we're all busy and it's not really that important in
> practice. The project of course welcome any contributions though :)
>
>
>>
>> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:
>>
>>> This is a statement on multiple partitions and there is really no
>>> optimization the code internally does on that. In fact, I strongly advise
>>> you to not use a batch but rather simply do a for loop client side and send
>>> statement individually. That way, your driver will be able to use proper
>>> token-awareness for each request (while if you send a batch, one
>>> coordinator will be picked up and will have to forward most statement,
>>> doing more network hops at the end of the day). The only case where using a
>>> batch is indeed legit is if you care about all the statement being atomic,
>>> but in that case it's a logged batch you want.
>>>
>>> That's btw more or less why we never bothered implementing that: it's
>>> totally doable technically, but it's not really such a good idea
>>> performance wise in practice most of the time, and you can easily work it
>>> around with a batch if you need atomicity.
>>>
>>> Which is not saying it will never be and shouldn't be supported btw,
>>> there is something to be said for the consistency of the CQL language in
>>> general. But it's why no-one took time to do it so far.
>>>
>>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> Yes, thats the workaround - I'll try that.
>>>>
>>>> Would you agree it would be better for internal optimizations to
>>>> process this within a single statement?
>>>>
>>>> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>>>
>>>>> Yep, that makes it clear. I think an unlogged batch of prepared
>>>>> statements with one statement per PK tuple would be roughly equivalent? 
>>>>> And
>>>>> probably no more complex to generate in the client?
>>>>>
>>>>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com>
>>>>> wrote:
>>>>>
>>>>>> Maybe that makes it clear:
>>>>>>
>>>>>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2),
>>>>>> (1, 3), (2, 3), (3, 4));
>>>>>>
>>>>>> If want to delete or select a bunch of records identified by their
>>>>>> multi-partitionkey tuples.
>>>>>>
>>>>>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>>>>>
>>>>>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>>>>>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>>>>>
>>>>>> Cheers
>>>>>> Ben
>>>>>>
>>>>>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> CQL says this is not allowed:
>>>>>>
>>>>>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>>>>>
>>>>>> 1. Is there a reason for it? There shouldn't be a performance
>>>>>> penalty, it is a PK lookup, the same thing works with a single pk column
>>>>>> 2. Is there a known workaround for it?
>>>>>>
>>>>>> It would be much of a help to have it for daily business, IMHO it's a
>>>>>> waste of resources to run multiple queries just to fe

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

Maybe that makes it clear:

DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1, 3),
(2, 3), (3, 4));

If want to delete or select a bunch of records identified by their
multi-partitionkey tuples.

2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:

> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com> wrote:
>
>> Hi Guys,
>>
>> CQL says this is not allowed:
>>
>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>
>> 1. Is there a reason for it? There shouldn't be a performance penalty, it
>> is a PK lookup, the same thing works with a single pk column
>> 2. Is there a known workaround for it?
>>
>> It would be much of a help to have it for daily business, IMHO it's a
>> waste of resources to run multiple queries just to fetch a bunch of records
>> by a PK.
>>
>> Thanks in advance for any reply
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: cassandra user request log

2017-02-10 Thread Benjamin Roth

On a cluster with just a little bit load, that would cause zillions of
petabytes of logs (just roughly ;)). I don't think this is viable.
There are many many JMX metrics on an aggregated level. But none per authed
used.
What exactly do you want to find out? Is it for debugging purposes?


2017-02-10 9:42 GMT+01:00 vincent gromakowski <vincent.gromakow...@gmail.com
>:

> Hi all,
> Is there any way to trace user activity at the server level to see which
> user is accessing which data ? Do you thin it would be simple to implement ?
> Tx
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: cassandra user request log

2017-02-10 Thread Benjamin Roth

you could write a custom trigger that logs access to specific CFs. But be
aware that this may have a big performance impact.

2017-02-10 9:58 GMT+01:00 vincent gromakowski <vincent.gromakow...@gmail.com
>:

> GDPR compliancy...we need to trace user activity on personal data. Maybe
> there is another way ?
>
> 2017-02-10 9:46 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>
>> On a cluster with just a little bit load, that would cause zillions of
>> petabytes of logs (just roughly ;)). I don't think this is viable.
>> There are many many JMX metrics on an aggregated level. But none per
>> authed used.
>> What exactly do you want to find out? Is it for debugging purposes?
>>
>>
>> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
>> vincent.gromakow...@gmail.com>:
>>
>>> Hi all,
>>> Is there any way to trace user activity at the server level to see which
>>> user is accessing which data ? Do you thin it would be simple to implement ?
>>> Tx
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: cassandra user request log

2017-02-10 Thread Benjamin Roth

If you want to audit write operations only, you could maybe use CDC, this
is a quite new feature in 3.x (I think it was introduced in 3.9 or 3.10)

2017-02-10 10:10 GMT+01:00 vincent gromakowski <
vincent.gromakow...@gmail.com>:

> tx
>
> 2017-02-10 10:01 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>
>> you could write a custom trigger that logs access to specific CFs. But be
>> aware that this may have a big performance impact.
>>
>> 2017-02-10 9:58 GMT+01:00 vincent gromakowski <
>> vincent.gromakow...@gmail.com>:
>>
>>> GDPR compliancy...we need to trace user activity on personal data. Maybe
>>> there is another way ?
>>>
>>> 2017-02-10 9:46 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>>>
>>>> On a cluster with just a little bit load, that would cause zillions of
>>>> petabytes of logs (just roughly ;)). I don't think this is viable.
>>>> There are many many JMX metrics on an aggregated level. But none per
>>>> authed used.
>>>> What exactly do you want to find out? Is it for debugging purposes?
>>>>
>>>>
>>>> 2017-02-10 9:42 GMT+01:00 vincent gromakowski <
>>>> vincent.gromakow...@gmail.com>:
>>>>
>>>>> Hi all,
>>>>> Is there any way to trace user activity at the server level to see
>>>>> which user is accessing which data ? Do you thin it would be simple to
>>>>> implement ?
>>>>> Tx
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>>> <+49%207161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: How does cassandra achieve Linearizability?

2017-02-10 Thread Benjamin Roth

ith a GPS modules
>>>> is not terribly complex. Low latency and jitter on servers you manage.
>>>> 140ms is a long way away network-wise, and I would suggest that was a
>>>> poor choice of upstream (probably stratum 2 or 3) source.
>>>>
>>>> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
>>>> need as close as you can get, you'll probably need to do it yourself.
>>>>
>>>> (I run several stratum 2 ntpd servers for pool.ntp.org)
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>> On 02/09/2017 06:47 PM, Kant Kodali wrote:
>>>> > Hi Justin,
>>>> >
>>>> > There are bunch of issues w.r.t to synchronization of clocks when we
>>>> > used ntpd. Also the time it took to sync the clocks was approx 140ms
>>>> > (don't quote me on it though because it is reported by our devops :)
>>>> >
>>>> > we have multiple clients (for example bunch of micro services are
>>>> > reading from Cassandra) I am not sure how one can achieve
>>>> > Linearizability by setting timestamps on the clients ? since there is
>>>> no
>>>> > total ordering across multiple clients.
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron <
>>>> jus...@instaclustr.com
>>>> > <mailto:jus...@instaclustr.com>> wrote:
>>>> >
>>>> > Hi Kant,
>>>> >
>>>> > Clock synchronization is important - you should ensure that ntpd
>>>> is
>>>> > properly configured on all nodes. If your particular use case is
>>>> > especially sensitive to out-of-order mutations it is possible to
>>>> set
>>>> > timestamps on the client side using the
>>>> > drivers. https://docs.datastax.com/en/d
>>>> eveloper/java-driver/3.1/manual/query_timestamps/
>>>> > <https://docs.datastax.com/en/developer/java-driver/3.1/man
>>>> ual/query_timestamps/>
>>>> >
>>>> > We use our own NTP cluster to reduce clock drift as much as
>>>> > possible, but public NTP servers are good enough for most
>>>> > uses. https://www.instaclustr.com/bl
>>>> og/2015/11/05/apache-cassandra-synchronization/
>>>> > <https://www.instaclustr.com/blog/2015/11/05/apache-cassand
>>>> ra-synchronization/>
>>>> >
>>>> > Cheers,
>>>> > Justin
>>>> >
>>>> > On Thu, 9 Feb 2017 at 16:09 Kant Kodali <k...@peernova.com
>>>> > <mailto:k...@peernova.com>> wrote:
>>>> >
>>>> > How does Cassandra achieve Linearizability with “Last write
>>>> > wins” (conflict resolution methods based on time-of-day
>>>> clocks) ?
>>>> >
>>>> > Relying on synchronized clocks are almost certainly
>>>> > non-linearizable, because clock timestamps cannot be
>>>> guaranteed
>>>> > to be consistent with actual event ordering due to clock skew.
>>>> > isn't it?
>>>> >
>>>> > Thanks!
>>>> >
>>>> > --
>>>> >
>>>> > Justin Cameron
>>>> >
>>>> > Senior Software Engineer | Instaclustr
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > This email has been sent on behalf of Instaclustr Pty Ltd
>>>> > (Australia) and Instaclustr Inc (USA).
>>>> >
>>>> > This email and any attachments may contain confidential and
>>>> legally
>>>> > privileged information.  If you are not the intended recipient, do
>>>> > not copy or disclose its content, but please reply to this email
>>>> > immediately and highlight the error to the sender and then
>>>> > immediately delete the message.
>>>> >
>>>> >
>>>>
>>>>
>>>
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Benjamin Roth

No matter what has to be indexed here, the preferrable way is most probably
denormalization instead of another index.

2017-02-16 15:09 GMT+01:00 DuyHai Doan <doanduy...@gmail.com>:

> [image: Inline image 1]
>
> On Thu, Feb 16, 2017 at 3:08 PM, Micha <mich...@fantasymail.de> wrote:
>
>>
>>
>> On 16.02.2017 14:30, DuyHai Doan wrote:
>> > Why indexing BLOB data ? It does not make any sense
>>
>> My partition key is a secure hash sum,  I don't index a blob.
>>
>>
>>
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

High disk io read load

2017-02-15 Thread Benjamin Roth

Hi there,

Following situation in cluster with 10 nodes:
Node A's disk read IO is ~20 times higher than the read load of node B.
The nodes are exactly the same except:
- Node A has 512 tokens and Node B 256. So it has double the load (data).
- Node A also has 2 SSDs, Node B only 1 SSD (according to load)

Node A has roughly 460GB, Node B 260GB total disk usage.
Both nodes have 128GB RAM and 40 cores.

Of course I assumed that Node A does more reads because cache / load ratio
is worse but a factor of 20 makes me very sceptic.

Of course Node A has a much higher and less predictable latency due to the
wait states.

Has anybody experienced similar situations?
Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
payload is not that few. I am pretty sure that not the whole dataset of
460GB is "hot".

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-24 Thread Benjamin Roth

There is much more to it than just changing the RF in the keyspace!

See here:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

2017-01-24 16:18 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  256  MB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  240 MB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  240 MB   256  ?   
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1
> 02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
> dc_india is old Dc which contains all data.
>
> I update keyspace as per below:
>
>
>
> alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy',
> 'DRPOCcluster': '2','dc_india':'2'}  AND durable_writes = true;
>
>
>
> but old data is not updating in DRPOCcluster(which is new). Also, while
> running nodetool rebuild getting below exception:
>
> Cammand: ./nodetool rebuild -dc dc_india
>
>
>
> Exception : nodetool: Unable to find sufficient sources for streaming
> range (-875697427424852,-8755484427030035332] in keyspace
> system_distributed
>
>
>
> Cassandra version : 3.0.9
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-24 Thread Benjamin Roth

I am not an expert in bootstrapping new DCs but shouldn't the OLD nodes
appear as UP to be used as a streaming source in rebuild?

2017-01-24 16:32 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Yes, I take all steps. While I am inserting new data is replicating on
> both DC. But only old data is not replication in new cluster.
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Tuesday, January 24, 2017 8:55 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
> new Cluster
>
>
>
> There is much more to it than just changing the RF in the keyspace!
>
>
>
> See here: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/
> opsAddDCToCluster.html
>
>
>
> 2017-01-24 16:18 GMT+01:00 Abhishek Kumar Maheshwari <Abhishek.Maheshwari@
> timesinternet.in>:
>
> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  256  MB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  240 MB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  240 MB   256  ?   
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1
> 02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
> dc_india is old Dc which contains all data.
>
> I update keyspace as per below:
>
>
>
> alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy',
> 'DRPOCcluster': '2','dc_india':'2'}  AND durable_writes = true;
>
>
>
> but old data is not updating in DRPOCcluster(which is new). Also, while
> running nodetool rebuild getting below exception:
>
> Cammand: ./nodetool rebuild -dc dc_india
>
>
>
> Exception : nodetool: Unable to find sufficient sources for streaming
> range (-875697427424852,-8755484427030035332] in keyspace
> system_distributed
>
>
>
> Cassandra version : 3.0.9
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Benjamin Roth

Ok got it.

But it's interesting that this is supported:
DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));

This is technically mostly the same (Token awareness, coordination/routing,
read performance, ...), right?

2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:

> This is a statement on multiple partitions and there is really no
> optimization the code internally does on that. In fact, I strongly advise
> you to not use a batch but rather simply do a for loop client side and send
> statement individually. That way, your driver will be able to use proper
> token-awareness for each request (while if you send a batch, one
> coordinator will be picked up and will have to forward most statement,
> doing more network hops at the end of the day). The only case where using a
> batch is indeed legit is if you care about all the statement being atomic,
> but in that case it's a logged batch you want.
>
> That's btw more or less why we never bothered implementing that: it's
> totally doable technically, but it's not really such a good idea
> performance wise in practice most of the time, and you can easily work it
> around with a batch if you need atomicity.
>
> Which is not saying it will never be and shouldn't be supported btw, there
> is something to be said for the consistency of the CQL language in general.
> But it's why no-one took time to do it so far.
>
> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Yes, thats the workaround - I'll try that.
>>
>> Would you agree it would be better for internal optimizations to process
>> this within a single statement?
>>
>> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>
>>> Yep, that makes it clear. I think an unlogged batch of prepared
>>> statements with one statement per PK tuple would be roughly equivalent? And
>>> probably no more complex to generate in the client?
>>>
>>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> Maybe that makes it clear:
>>>>
>>>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>>>> 3), (2, 3), (3, 4));
>>>>
>>>> If want to delete or select a bunch of records identified by their
>>>> multi-partitionkey tuples.
>>>>
>>>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>>>
>>>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>>>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>>>> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>> CQL says this is not allowed:
>>>>
>>>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>>>
>>>> 1. Is there a reason for it? There shouldn't be a performance penalty,
>>>> it is a PK lookup, the same thing works with a single pk column
>>>> 2. Is there a known workaround for it?
>>>>
>>>> It would be much of a help to have it for daily business, IMHO it's a
>>>> waste of resources to run multiple queries just to fetch a bunch of records
>>>> by a PK.
>>>>
>>>> Thanks in advance for any reply
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>>> <+49%207161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>> --
>>>> 
>>>> Ben Slater
>>>> Chief Product Officer
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>> +61 437 929 798 <+61%20437%20929%20798>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>>> <+49%207161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>> --
>>> 
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798 <+61%20437%20929%20798>
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

2017-02-17 Thread Benjamin Roth

Hi Nate,

See here dstat results:
https://gist.github.com/brstgt/216c662b525a9c5b653bbcd8da5b3fcb
Network volume does not correspond to Disk IO, not even close.

@heterogenous vnode count:
I did this to test how load behaves on a new server class we ordered for
CS. The new nodes had much faster CPUs than our older nodes. If not
assigning more tokens to new nodes, what else would you recommend to give
more weight + load to newer and usually faster servers.

2017-02-16 23:21 GMT+01:00 Nate McCall <n...@thelastpickle.com>:

>
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>
>
> I very rarely see heterogeneous vnode counts in the same cluster. I would
> almost guarantee you are the only one doing this with MVs as well.
>
> That said, since you have different IO hardware, are you sure the system
> configurations (eg. block size, read ahead, etc) are the same on both
> machines? Is dstat showing a similar order of magnitude of network traffic
> in vs. IO for what you would expect?
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: sasi index question (read timeout on many selects)

2017-02-17 Thread Benjamin Roth

Btw:

They break incremental repair if you use CDC: https://issues.apache.
org/jira/browse/CASSANDRA-12888


Not only when using CDC! You shouldn't use incremental repairs with MVs.
Never (right now).

2017-02-16 17:42 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

> My advice to avoid them is based on the issues that have been filed in
> Jira.  Benjamin Roth is one of the only people talking about his MV usage,
> and has filed a few JIRAs discussing their problems when bootstrapping new
> nodes, as well as issues repairing.
>
> https://issues.apache.org/jira/browse/CASSANDRA-12730?
> jql=project%20%3D%20CASSANDRA%20and%20reporter%20%3D%
> 20brstgt%20and%20text%20~%20%22materialized%22
>
> They also can't be altered: https://issues.apache.org/jira/browse/
> CASSANDRA-9736
>
> They may be less performant than managing the data yourself:
> https://issues.apache.org/jira/browse/CASSANDRA-10295, https://
> issues.apache.org/jira/browse/CASSANDRA-10307
>
> They're not as flexible as your own tables: https://issues.apache.
> org/jira/browse/CASSANDRA-9928, https://issues.apache.org/
> jira/browse/CASSANDRA-11194, https://issues.apache.org/jira/
> browse/CASSANDRA-12463
>
> They break incremental repair if you use CDC: https://issues.apache.
> org/jira/browse/CASSANDRA-12888
>
> I don't know why DataStax advises using them.  Perhaps ask them?
>
> Jon
>
> On Thu, Feb 16, 2017 at 7:57 AM Micha <mich...@fantasymail.de> wrote:
>
>>
>>
>> On 16.02.2017 16:33, Jonathan Haddad wrote:
>> >
>> > Regarding MVs, do not use the ones that shipped with 3.x.  They're not
>> > ready for production.  Manage it yourself by using a second table and
>> > inserting a second record there.
>> >
>>
>> Out of interest... there is a slight discrepance between the advice not
>> to use mv and the docu about the feature on the datastax side. Or do I
>> have to use another cassandra version (instead of 3.9)?
>>
>>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: parallel processing - splitting data

2017-01-19 Thread Benjamin Roth

If you have 4 Nodes with RF 4 then all data is on every node. So you can
just slice the whole token range into 4 pieces and let each node process 1
slice.
Determining local ranges also only helps if you read with CL_ONE.

2017-01-19 13:05 GMT+01:00 Frank Hughes <frankhughes...@gmail.com>:

> Hello there,
>
> I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of
> 4.
>
> I want to be able to run a java process on each node only selecting a 25%
> of the data on each node,
> so i can process all of the data in parallel on each node.
>
> What is the best way to do this with the java driver ?
>
> I was assuming I could retrieve the token ranges for each node and page
> through the data using these ranges, but this includes the replicated data.
> I was hoping there was away of only selecting the data that a node is
> responsible for and avoiding the replicated data.
>
> Many thanks for any help and guidance,
>
> Frank Hughes
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: parallel processing - splitting data

2017-01-19 Thread Benjamin Roth

I meant the global whole token range which is -(2^64/2) to ((2^64) / 2 - 1)
I remember there are classes that already generate the right slices but
don't know by heart which one it was.

2017-01-19 13:29 GMT+01:00 Frank Hughes <frankhughes...@gmail.com>:

> I have tried to retrieve the token range and slice in 4, but the response
> i get for the following code is different on each node:
>
> TokenRange[] tokenRanges = 
> unwrapTokenRanges(metadata.getTokenRanges(keyspaceName,
> localHost)).toArray(new TokenRange[0]);
>
> On each node, the 1024 token ranges are different, so Im not sure how to
> do the split.
>
> e.g. from node 1
>
> Token ranges - start:-5144720537407094184 end:-5129226025397315327
>
> This token range isn't returned by node 2, 3 or 4.
>
> Thanks again
>
> Frank
>
> On 19 January 2017 at 12:19, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> If you have 4 Nodes with RF 4 then all data is on every node. So you can
>> just slice the whole token range into 4 pieces and let each node process 1
>> slice.
>> Determining local ranges also only helps if you read with CL_ONE.
>>
>> 2017-01-19 13:05 GMT+01:00 Frank Hughes <frankhughes...@gmail.com>:
>>
>>> Hello there,
>>>
>>> I'm running a 4 node cluster of Cassandra 3.9 with a replication factor
>>> of 4.
>>>
>>> I want to be able to run a java process on each node only selecting a
>>> 25% of the data on each node,
>>> so i can process all of the data in parallel on each node.
>>>
>>> What is the best way to do this with the java driver ?
>>>
>>> I was assuming I could retrieve the token ranges for each node and page
>>> through the data using these ranges, but this includes the replicated data.
>>> I was hoping there was away of only selecting the data that a node is
>>> responsible for and avoiding the replicated data.
>>>
>>> Many thanks for any help and guidance,
>>>
>>> Frank Hughes
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

unsubscribe

2017-02-28 Thread Benjamin Roth

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Is periodic manual repair necessary?

2017-02-28 Thread benjamin roth

Hi Jayesh,

Your statements are mostly right, except:
Yes, compactions do purge tombstones but that *does not avoid resurrection*.
A resurrection takes place in this situation:

Node A:
Key A is written
Key A is deleted

Node B:
Key A is written
- Deletion never happens for example because of a dropped mutation-

Then after gc_grace_seconds:
Node A:
Compaction removes both write and tombstone, so data is completely gone

Node B:
Still contains Key A

Then you do a repair
Node A:
Receives Key A from Node B

Got it?

But I was thinking a bit about your situation. If you NEVER do deletes and
have ONLY TTLs, this could change the game. Difference? If you have only
TTLs, the delete information and the write information resides always on
the same node and never exists alone, so the write-delete pair should
always be consistent. As far as i can see there will no be ressurections
then.
BUT: Please don't nail me down on it. *I have neither tested it nor read
the source code to prove it in theory.*

Maybe some other guys have some more thoughts or information on this.

By the way:
CS itself is not fragile. Distributed systems are. It's like the old
saying: Things that can go wrong will go wrong. Network fails, hardware
fails, software fails. You can have timeouts, dropped messages (timeouts
help a cluster/node to survive high pressure situations), a crashed daemon.
Yes things go wrong. All the time. Even on a 1 node system (like MySQL)
ensuring absolute consistency is not so easy and requires many safety nets
like unbuffered IO and battery backed HD controllers which can harm
performance a lot.

You could also create a perfectly consistent distributed system like CS but
it would be slow and not partition tolerant or not highly available.

2017-02-28 16:06 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com>:

> Thanks - getting a better picture of things.
>
>
>
> So "entropy" is tendency of a C* datastore to be inconsistent due to
> writes/updates not taking place across ALL nodes that carry replica of a
> row (can happen if nodes are down for maintenance)
>
> It can also happen due to node crashes/restarts that can result in loss of
> uncommitted data.
>
> This can result in either stale data or ghost data (column/row
> re-appearing after a delete).
>
> So there are the "anti-entropy" processes in place to help with this
>
> - hinted handoff
>
> - read repair (can happen while performing a consistent read OR also async
> as driven/configured by *_read_repair_chance AFTER consistent read)
>
> - commit logs
>
> - explicit/manual repair via command
>
> - compaction (compaction is indirect mechanism to purge tombstone, thereby
> ensuring that stale data will NOT resurrect)
>
>
>
> So for an application where you have only timeseries data or where data is
> always inserted, I would like to know the need for manual repair?
>
>
>
> I see/hear advice that there should always be a periodic (mostly weekly)
> manual/explicit repair in a C* system - and that's what I am trying to
> understand.
>
> Repair is a real expensive process and would like to justify the need to
> expend resources (when and how much) for it.
>
>
>
> Among other things, this advice also gives an impression to people not
> familiar with C* (e.g. me) that it is too fragile and needs substantial
> manual intervention.
>
>
>
> Appreciate all the feedback and details that you have been sharing.
>
>
>
> *From: *Edward Capriolo <edlinuxg...@gmail.com>
> *Date: *Monday, February 27, 2017 at 8:00 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Cc: *Benjamin Roth <benjamin.r...@jaumo.com>
> *Subject: *Re: Is periodic manual repair necessary?
>
>
>
> There are 4 anti entropy systems in cassandra.
>
>
>
> Hinted handoff
>
> Read repair
>
> Commit logs
>
> Repair commamd
>
>
>
> All are basically best effort.
>
>
>
> Commit logs get corrupt and only flush periodically.
>
>
>
> Bits rot on disk and while crossing networks network
>
>
>
> Read repair is async and only happens randomly
>
>
>
> Hinted handoff stops after some time and is not guarenteed.
> On Monday, February 27, 2017, Thakrar, Jayesh <
> jthak...@conversantmedia.com> wrote:
>
> Thanks Roth and Oskar for your quick responses.
>
>
>
> This is a single datacenter, multi-rack setup.
>
>
>
> > A TTL is technically similar to a delete - in the end both create
> tombstones.
>
> >If you want to eliminate the possibility of resurrected deleted data, you
> should run repairs.
>
> So why do I need to worry about data resurrection?
>
> Because, the TTL for the data is specified at the row

Rebuild / removenode with MV is inconsistent

2017-03-01 Thread benjamin roth

Hi there,

Today I come up with the following thesis:

A rebuild / removenode may break the base-table <> MV contract.
I'd even claim that a rebuild / removenode requires rebuilding all MVs to
guarantee MV consistency.

Reason:
A node can have base tables with MVs. This is no problem. If these are
streamed during rebuild/removenode, underlying MVs are updated by write
path and consistency contract will be fulfilled.
BUT a node may also contain ranges for MVs whose base table reside on a
different node. When these are streamed from a another node, then for
example base table on node A suddenly has the replica from the base table
of node B and this is not consistent any more.

Re: Understanding of proliferation of sstables during a repair

2017-02-26 Thread Benjamin Roth

Hi Seth,

Repairs can create a lot of tiny SSTables. I also encountered the creation
of so many sstables that the node died because of TMOF. At that time the
affected nodes were REALLY inconsistent.

One reason can be immense inconsistencies spread over many
partition(-ranges) with a lot of subrange repairs that trigger a lot of
independant streams. Each stream results in a single SSTable that can be
very small. No matter how small it is, it has to be compacted and can cause
a compaction impact that is a lot bigger than expected from a tiny little
table.

Also consider that there is a theoretical race condition that can cause
repairs even though data is not inconsistent due to "flighing in mutations"
during merkle tree calculation.

2017-02-26 20:41 GMT+01:00 Seth Edwards <s...@pubnub.com>:

> Hello,
>
> We just ran a repair on a keyspace using TWCS and a mixture of TTLs .This
> caused a large proliferation of sstables and compactions. There is likely a
> lot of entropy in this keyspace. I am trying to better understand why this
> is.
>
> I've also read that you may not want to run repairs on short TTL data and
> rely upon other anti-entropy mechanisms to achieve consistency instead. Is
> this generally true?
>
>
> Thanks!
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Benjamin Roth

This is not a queue pattern and I'd recommend LCS for better read
performance.

2017-02-27 16:06 GMT+01:00 Rakesh Kumar <rakeshkumar...@outlook.com>:

> Do you update this table when an event is processed?  If yes, it is
> considered a good practice for Cassandra.  I read somewhere that using
> Cassandra as a queuing table is anti pattern.
> 
> From: Vincent Rischmann <m...@vrischmann.me>
> Sent: Friday, February 24, 2017 06:24
> To: user@cassandra.apache.org
> Subject: Which compaction strategy when modeling a dumb set
>
> Hello,
>
> I'm using a table like this:
>
>CREATE TABLE myset (id uuid PRIMARY KEY)
>
> which is basically a set I use for deduplication, id is a unique id for an
> event, when I process the event I insert the id, and before processing I
> check if it has already been processed for deduplication.
>
> It works well enough, but I'm wondering which compaction strategy I should
> use. I expect maybe 1% or less of events will end up duplicated (thus not
> generating an insert), so the workload will probably be 50% writes 50% read.
>
> Is LCS a good strategy here or should I stick with STCS ?
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Is periodic manual repair necessary?

2017-02-27 Thread Benjamin Roth

A TTL is technically similar to a delete - in the end both create
tombstones.
If you want to eliminate the possibility of resurrected deleted data, you
should run repairs.

If you can guarantuee a 100% that data is read-repaired before
gc_grace_seconds after the data has been TTL'ed, you won't need an extra
repair.

2017-02-27 18:29 GMT+01:00 Oskar Kjellin <oskar.kjel...@gmail.com>:

> Are you running multi dc?
>
> Skickat från min iPad
>
> 27 feb. 2017 kl. 16:08 skrev Thakrar, Jayesh <jthak...@conversantmedia.com
> >:
>
> Suppose I have an application, where there are no deletes, only 5-10% of
> rows being occasionally updated (and that too only once) and a lot of reads.
>
>
>
> Furthermore, I have replication = 3 and both read and write are configured
> for local_quorum.
>
>
>
> Occasionally, servers do go into maintenance.
>
>
>
> I understand when the maintenance is longer than the period for
> hinted_handoffs to be preserved, they are lost and servers may have stale
> data.
>
> But I do expect it to be rectified on reads. If the stale data is not read
> again, I don’t care for it to be corrected as then the data will be
> automatically purged because of TTL.
>
>
>
> In such a situation, do I need to have a periodic (weekly?) manual/batch
> read_repair process?
>
>
>
> Thanks,
>
> Jayesh Thakrar
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Understanding of proliferation of sstables during a repair

2017-02-26 Thread Benjamin Roth

Too many open files. Which is 100k by default and we had >40k sstables.
Normally the are around 500-1000.

Am 27.02.2017 02:40 schrieb "Seth Edwards" <s...@pubnub.com>:

> This makes a lot more sense. What does TMOF stand for?
>
> On Sun, Feb 26, 2017 at 1:01 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Hi Seth,
>>
>> Repairs can create a lot of tiny SSTables. I also encountered the
>> creation of so many sstables that the node died because of TMOF. At that
>> time the affected nodes were REALLY inconsistent.
>>
>> One reason can be immense inconsistencies spread over many
>> partition(-ranges) with a lot of subrange repairs that trigger a lot of
>> independant streams. Each stream results in a single SSTable that can be
>> very small. No matter how small it is, it has to be compacted and can cause
>> a compaction impact that is a lot bigger than expected from a tiny little
>> table.
>>
>> Also consider that there is a theoretical race condition that can cause
>> repairs even though data is not inconsistent due to "flighing in mutations"
>> during merkle tree calculation.
>>
>> 2017-02-26 20:41 GMT+01:00 Seth Edwards <s...@pubnub.com>:
>>
>>> Hello,
>>>
>>> We just ran a repair on a keyspace using TWCS and a mixture of TTLs
>>> .This caused a large proliferation of sstables and compactions. There is
>>> likely a lot of entropy in this keyspace. I am trying to better understand
>>> why this is.
>>>
>>> I've also read that you may not want to run repairs on short TTL data
>>> and rely upon other anti-entropy mechanisms to achieve consistency instead.
>>> Is this generally true?
>>>
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>

Re: dtests jolokia fails to attach

2016-10-06 Thread Benjamin Roth

Maybe additional information, this is the CS command line for ccm node1:

br   20376  3.2  8.6 2331136 708308 pts/5  Sl   06:10   0:30 java
-Xloggc:/home/br/.ccm/test/node1/logs/gc.log -ea -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k
-XX:StringTableSize=103 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking
-XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem
-Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways
-XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
-Xms500M -Xmx500M -Xmn50M -XX:+UseCondCardMark
-XX:CompileCommandFile=/home/br/.ccm/test/node1/conf/hotspot_compiler
-javaagent:/home/br/repos/cassandra/lib/jamm-0.3.0.jar
-Dcassandra.jmx.local.port=7100
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password
-Djava.library.path=/home/br/repos/cassandra/lib/sigar-bin
-Dcassandra.migration_task_wait_in_seconds=6
-Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
-Dlogback.configurationFile=logback.xml
-Dcassandra.logdir=/var/log/cassandra
-Dcassandra.storagedir=/home/br/repos/cassandra/data
-Dcassandra-pidfile=/home/br/.ccm/test/node1/cassandra.pid -cp
/home/br/.ccm/test/node1/conf:/home/br/repos/cassandra/build/classes/main:/home/br/repos/cassandra/build/classes/thrift:/home/br/repos/cassandra/lib/HdrHistogram-2.1.9.jar:/home/br/repos/cassandra/lib/ST4-4.0.8.jar:/home/br/repos/cassandra/lib/airline-0.6.jar:/home/br/repos/cassandra/lib/antlr-runtime-3.5.2.jar:/home/br/repos/cassandra/lib/asm-5.0.4.jar:/home/br/repos/cassandra/lib/caffeine-2.2.6.jar:/home/br/repos/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/home/br/repos/cassandra/lib/commons-cli-1.1.jar:/home/br/repos/cassandra/lib/commons-codec-1.2.jar:/home/br/repos/cassandra/lib/commons-lang3-3.1.jar:/home/br/repos/cassandra/lib/commons-math3-3.2.jar:/home/br/repos/cassandra/lib/compress-lzf-0.8.4.jar:/home/br/repos/cassandra/lib/concurrent-trees-2.4.0.jar:/home/br/repos/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/home/br/repos/cassandra/lib/disruptor-3.0.1.jar:/home/br/repos/cassandra/lib/ecj-4.4.2.jar:/home/br/repos/cassandra/lib/guava-18.0.jar:/home/br/repos/cassandra/lib/high-scale-lib-1.0.6.jar:/home/br/repos/cassandra/lib/hppc-0.5.4.jar:/home/br/repos/cassandra/lib/jackson-core-asl-1.9.2.jar:/home/br/repos/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/home/br/repos/cassandra/lib/jamm-0.3.0.jar:/home/br/repos/cassandra/lib/javax.inject.jar:/home/br/repos/cassandra/lib/jbcrypt-0.3m.jar:/home/br/repos/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/home/br/repos/cassandra/lib/jctools-core-1.2.1.jar:/home/br/repos/cassandra/lib/jflex-1.6.0.jar:/home/br/repos/cassandra/lib/jna-4.0.0.jar:/home/br/repos/cassandra/lib/joda-time-2.4.jar:/home/br/repos/cassandra/lib/json-simple-1.1.jar:/home/br/repos/cassandra/lib/libthrift-0.9.2.jar:/home/br/repos/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/home/br/repos/cassandra/lib/logback-classic-1.1.3.jar:/home/br/repos/cassandra/lib/logback-core-1.1.3.jar:/home/br/repos/cassandra/lib/lz4-1.3.0.jar:/home/br/repos/cassandra/lib/metrics-core-3.1.0.jar:/home/br/repos/cassandra/lib/metrics-jvm-3.1.0.jar:/home/br/repos/cassandra/lib/metrics-logback-3.1.0.jar:/home/br/repos/cassandra/lib/netty-all-4.0.39.Final.jar:/home/br/repos/cassandra/lib/ohc-core-0.4.4.jar:/home/br/repos/cassandra/lib/ohc-core-j8-0.4.4.jar:/home/br/repos/cassandra/lib/primitive-1.0.jar:/home/br/repos/cassandra/lib/reporter-config-base-3.0.0.jar:/home/br/repos/cassandra/lib/reporter-config3-3.0.0.jar:/home/br/repos/cassandra/lib/sigar-1.6.4.jar:/home/br/repos/cassandra/lib/slf4j-api-1.7.7.jar:/home/br/repos/cassandra/lib/snakeyaml-1.11.jar:/home/br/repos/cassandra/lib/snappy-java-1.1.1.7.jar:/home/br/repos/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/home/br/repos/cassandra/lib/stream-2.5.2.jar:/home/br/repos/cassandra/lib/thrift-server-0.3.7.jar:/home/br/repos/cassandra/lib/jsr223/*/*.jar
-Dcassandra.join_ring=True -Dcassandra.logdir=/home/br/.ccm/test/node1/logs
-Dcassandra.boot_without_jna=true
org.apache.cassandra.service.CassandraDaemon

Java version:
br@dev1:~/repos/cassandra-dtest$ java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

dtests jolokia fails to attach

2016-10-06 Thread Benjamin Roth

Since some days I have the problem that I cannot run a dtest as jolokia
fails to attach to the process.
It worked some days ago. I tried both on MacOS + Linux with dtest master
and pip'ed requirements.

dtest output is:
Failed to start jolokia agent (command was:
/usr/lib/jvm/oracle-java8-jdk-amd64/bin/java -cp
/usr/lib/jvm/oracle-java8-jdk-amd64/lib/tools.jar:lib/jolokia-jvm-1.2.3-agent.jar
org.jolokia.jvmagent.client.AgentLauncher --host 127.0.0.1 start 19810):
Command '('/usr/lib/jvm/oracle-java8-jdk-amd64/bin/java', '-cp',
'/usr/lib/jvm/oracle-java8-jdk-amd64/lib/tools.jar:lib/jolokia-jvm-1.2.3-agent.jar',
'org.jolokia.jvmagent.client.AgentLauncher', '--host', '127.0.0.1',
'start', '19810')' returned non-zero exit status 1
Exit status was: 1
Output was: Illegal Argument (command: start) : Cannot attach to process-ID
19810.
See --help for possible reasons.

When I try it manually by starting a ccm instance, nodes come up, JMX port
is open, PID matches, user is the same:

br@dev1:~$ lsof -i:9042
COMMAND   PID USER   FD   TYPEDEVICE SIZE/OFF NODE NAME
java20376   br  133u  IPv4 124151851  0t0  TCP localhost:9042
(LISTEN)
java20385   br  124u  IPv4 124152856  0t0  TCP 127.0.0.3:9042
(LISTEN)
java20394   br  124u  IPv4 124151896  0t0  TCP 127.0.0.2:9042
(LISTEN)
br@dev1:~/repos/cassandra-dtest$
/usr/lib/jvm/oracle-java8-jdk-amd64/bin/java -cp
/usr/lib/jvm/oracle-java8-jdk-amd64/lib/tools.jar:lib/jolokia-jvm-1.2.3-agent.jar
org.jolokia.jvmagent.client.AgentLauncher --host 127.0.0.1 start 20376
Illegal Argument (command: start) : Cannot attach to process-ID 20376.
See --help for possible reasons.
br@dev1:~/repos/cassandra-dtest$ lsof -i:7100
COMMAND   PID USER   FD   TYPEDEVICE SIZE/OFF NODE NAME
java20376   br   72u  IPv4 124151830  0t0  TCP *:font-service
(LISTEN)

Any ideas? I just want to be able to close CASSANDRA-12689

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth

Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...).
For example CCM (https://github.com/pcmanus/ccm) uses 127.0.0.[1-255] to
set up multiple CS instances on a single server.

2016-10-04 20:49 GMT+02:00 Mehdi Bada <mehdi.b...@dbi-services.com>:

> Virtual addresses can be possible also?
>
> Thanks Benjamin
>
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com www.dbi-services.com
>
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
>
>
> - Original Message -
> From: Benjamin Roth <benjamin.r...@jaumo.com>
> To: user@cassandra.apache.org
> Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST)
> Subject: Re: Cassandra listen port
>
> As far as I can see, these ports are also used for outgoing connection, so
> a node expects all other peers also to use that port. Therefore the answer
> is no. Use multiple IP addresses instead.
>
> 2016-10-04 20:03 GMT+02:00 Mehdi Bada <mehdi.b...@dbi-services.com>:
>
> > Thanks Vladimir.
> > It means if I want to run Cassandra on multi instance environment I only
> > have to change the listen address of each instance and the 9000 CQL
> port??
> >
> >
> > ---
> > Mehdi Bada | Consultant
> > Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
> 15
> > dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> > mehdi.b...@dbi-services.com www.dbi-services.com
> >
> > ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> >
> >
> > - Original Message -
> > From: Vladimir Yudovin <vla...@winguzone.com>
> > To: user@cassandra.apache.org
> > Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
> > Subject: Re: Cassandra listen port
> >
> > Actually the main port is 9042 - for client (CQL) connections and 7000
> > (7001 if SSL enabled) for inter node communications.
> >
> > Best regards, Vladimir Yudovin,
> > Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
> > Launch your cluster in minutes.
> >
> >
> >
> >
> >  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
> > Rothbenjamin.r...@jaumo.com wrote 
> >
> > There are several ports for several services. They are all set in
> > cassandra.yaml
> >
> > See here for complete documentation:
> > https://docs.datastax.com/en/cassandra/2.1/cassandra/
> > configuration/configCassandra_yaml_r.html
> >
> >
> >
> > 2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com
> :
> > Hi all,
> >
> >
> >
> > What is the listen port parameter for Apache Cassandra? Does it exist?
> >
> > In comparison with MongoDB, in mongo it's possible to set the listen port
> > in the mongod.conf (configuration file)
> >
> >
> >
> > Regards
> >
> > Mehdi
> >
> >
> >
> > Mehdi Bada | Consultant
> > Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
> 15
> > dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> > mehdi.b...@dbi-services.com
> > www.dbi-services.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the
> > team
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Benjamin Roth
> > Prokurist
> >
> > Jaumo GmbH · www.jaumo.com
> > Wehrstraße 46 · 73035 Göppingen · Germany
> > Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth

I started off with 3.0.6 and for my personal use case(s) they had the same
bugs as tick tock.

2016-10-04 19:03 GMT+02:00 Jonathan Haddad <j...@jonhaddad.com>:

> I strongly recommend avoiding tick tock. You'll be one of the only people
> putting it in prod and will likely hit a number of weird issues nobody will
> be able to help you with.
> On Tue, Oct 4, 2016 at 12:40 PM Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> I have the impression, that not the tick-tock is the real problem but MVs
>> are not really battle-tested yet.
>> Depending on the model, they put much more complexity on a cluster and
>> it's behaviour under heavy load. Especially if you are going to create an
>> MV with a different partition key than the base table this might be a shot
>> in the head.
>> At least I was able to bring my cluster down many times just by throwing
>> a few queries too much at it or by running some big repairs with reaper.
>> Only since some days, things seem to go smoothly after having struggled
>> about 2 months with very different kind of issues.
>>
>> We'll see ... most probably I will stick with the latest version. After
>> all it seems to work ok, I gained a lot of experience in running and
>> troubleshooting and to deal with bugs and maybe I am so able to contribute
>> a bit to further development.
>>
>> 2016-10-04 18:26 GMT+02:00 Vladimir Yudovin <vla...@winguzone.com>:
>>
>> >Would you consider 3.0.x to be more stable than 3.x?
>> I guess yes, but there are some discussion on this list:
>>
>> (C)* stable version after 3.5
>> <https://lists.apache.org/thread.html/4e4e67175efd1207965eb528e098f35dd268fba0f66632924d8bd0a2@%3Cuser.cassandra.apache.org%3E>
>> Upgrade from 3.0.6 to 3.7.
>> <https://lists.apache.org/thread.html/14e383fabe1ea1750a3e3eec78a1490ae27484ed2d3424d9c3aeb9e2@%3Cuser.cassandra.apache.org%3E>
>>
>> It seems to be eternal topic till tick-tock approach stabilizes.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> on Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
>> Roth<benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>>
>> I use the self-compiled master (3.10, ticktock). I had to fix a severe
>> bug on my own and decided to go with the latest code.
>> Would you consider 3.0.x to be more stable than 3.x?
>>
>> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin <vla...@winguzone.com>:
>>
>> Hi Benjamin!
>>
>> >we now use CS 3.x and have been advised that 3.x is still not considered
>> really production ready.
>>
>> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
>> fixes.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> on Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
>> <benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>>
>> Hi!
>>
>> I have a frequently used pattern which seems to be quite costly in CS.
>> The pattern is always the same: I have a unique key and a sorting by a
>> different field.
>>
>> To give an example, here a real life example from our model:
>> CREATE TABLE visits.visits_in (
>> user_id int,
>> user_id_visitor int,
>> created timestamp,
>> PRIMARY KEY (user_id, user_id_visitor)
>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>
>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>> SELECT user_id, created, user_id_visitor
>> FROM visits.visits_in
>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
>> IS NOT NULL
>> PRIMARY KEY (user_id, created, user_id_visitor)
>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>
>> This simply represents people, that visited my profile sorted by date
>> desc but only one entry per visitor.
>> Other examples with the same pattern could be a whats-app-like inbox
>> where the last message of each sender is shown by date desc. There are lots
>> of examples for that pattern.
>>
>> E.g. in redis I'd just use a sorted set, where the key could be like
>> "visits_${user_id}", set key would be user_id_visitor and score
>> the created timestamp.
&g

Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth

As far as I can see, these ports are also used for outgoing connection, so
a node expects all other peers also to use that port. Therefore the answer
is no. Use multiple IP addresses instead.

2016-10-04 20:03 GMT+02:00 Mehdi Bada <mehdi.b...@dbi-services.com>:

> Thanks Vladimir.
> It means if I want to run Cassandra on multi instance environment I only
> have to change the listen address of each instance and the 9000 CQL port??
>
>
> ---
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com www.dbi-services.com
>
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
>
>
> - Original Message -
> From: Vladimir Yudovin <vla...@winguzone.com>
> To: user@cassandra.apache.org
> Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
> Subject: Re: Cassandra listen port
>
> Actually the main port is 9042 - for client (CQL) connections and 7000
> (7001 if SSL enabled) for inter node communications.
>
> Best regards, Vladimir Yudovin,
> Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>
>
>  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
> Rothbenjamin.r...@jaumo.com wrote 
>
> There are several ports for several services. They are all set in
> cassandra.yaml
>
> See here for complete documentation:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> configuration/configCassandra_yaml_r.html
>
>
>
> 2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
> Hi all,
>
>
>
> What is the listen port parameter for Apache Cassandra? Does it exist?
>
> In comparison with MongoDB, in mongo it's possible to set the listen port
> in the mongod.conf (configuration file)
>
>
>
> Regards
>
> Mehdi
>
>
>
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
>
>
>
>
>
> ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the
> team
>
>
>
>
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth

Thanks guys!

Good to know, that my approach is basically right, but I will check that
lucene indices by time.

2016-10-04 14:22 GMT+02:00 DuyHai Doan <doanduy...@gmail.com>:

> "What scatter/gather? "
>
> http://www.slideshare.net/doanduyhai/sasi-cassandra-on-
> the-full-text-search-ride-voxxed-daybelgrade-2016/23
>
> "If you partition your data by user_id then you query only 1 shard to get
> sorted by time visitors for a user"
>
> Exact, but in this case, you're using a 2nd index only for sorting right ?
> For SASI it's not even possible. Maybe it can work with Statrio Lucene impl
>
> On Tue, Oct 4, 2016 at 2:15 PM, Dorian Hoxha <dorian.ho...@gmail.com>
> wrote:
>
>> @DuyHai
>>
>> What scatter/gather? If you partition your data by user_id then you query
>> only 1 shard to get sorted by time visitors for a user.
>>
>> On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>>> MV is right now your best choice for this kind of sorting behavior.
>>>
>>> Secondary index (whatever the impl, SASI or Lucene) has a cost of
>>> scatter-gather if your cluster scale out. With MV you're at least
>>> guaranteed to hit a single node everytime
>>>
>>> On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha <dorian.ho...@gmail.com>
>>> wrote:
>>>
>>>> Can you use the lucene index https://github.com/Stratio/cas
>>>> sandra-lucene-index ?
>>>>
>>>> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I have a frequently used pattern which seems to be quite costly in CS.
>>>>> The pattern is always the same: I have a unique key and a sorting by a
>>>>> different field.
>>>>>
>>>>> To give an example, here a real life example from our model:
>>>>> CREATE TABLE visits.visits_in (
>>>>> user_id int,
>>>>> user_id_visitor int,
>>>>> created timestamp,
>>>>> PRIMARY KEY (user_id, user_id_visitor)
>>>>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>>>>
>>>>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>>>>> SELECT user_id, created, user_id_visitor
>>>>> FROM visits.visits_in
>>>>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND
>>>>> user_id_visitor IS NOT NULL
>>>>> PRIMARY KEY (user_id, created, user_id_visitor)
>>>>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>>>>
>>>>> This simply represents people, that visited my profile sorted by date
>>>>> desc but only one entry per visitor.
>>>>> Other examples with the same pattern could be a whats-app-like inbox
>>>>> where the last message of each sender is shown by date desc. There are 
>>>>> lots
>>>>> of examples for that pattern.
>>>>>
>>>>> E.g. in redis I'd just use a sorted set, where the key could be like
>>>>> "visits_${user_id}", set key would be user_id_visitor and score
>>>>> the created timestamp.
>>>>> In MySQL I'd create the table with PK on user_id + user_id_visitor and
>>>>> create an index on user_id + created
>>>>> In C* i use an MV.
>>>>>
>>>>> Is this the most efficient approach?
>>>>> I also could have done this without an MV but then the situation in
>>>>> our app would be far more complex.
>>>>> I know that denormalization is a common pattern in C* and I don't
>>>>> hesitate to use it but in this case, it is not as simple as it's not an
>>>>> append-only case but updates have to be handled correctly.
>>>>> If it is the first visit of a user, it's that simple, just 2 inserts
>>>>> in base table + denormalized table. But on a 2nd or 3rd visit, the 1st or
>>>>> 2nd visit has to be deleted from the denormalized table before. Otherwise
>>>>> the visit would not be unique any more.
>>>>> Handling this case without an MV requires a lot more effort, I guess
>>>>> even more effort than just using an MV.
>>>>> 1. You need kind of app-side locking to deal with race conditions
>>>>> 2. Read before write is required to determine if an old record has to
>>>>> be deleted
>>>

Efficient model for a sorting

2016-10-04 Thread Benjamin Roth

Hi!

I have a frequently used pattern which seems to be quite costly in CS. The
pattern is always the same: I have a unique key and a sorting by a
different field.

To give an example, here a real life example from our model:
CREATE TABLE visits.visits_in (
user_id int,
user_id_visitor int,
created timestamp,
PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)

CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
SELECT user_id, created, user_id_visitor
FROM visits.visits_in
WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
IS NOT NULL
PRIMARY KEY (user_id, created, user_id_visitor)
WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)

This simply represents people, that visited my profile sorted by date desc
but only one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox where
the last message of each sender is shown by date desc. There are lots of
examples for that pattern.

E.g. in redis I'd just use a sorted set, where the key could be like
"visits_${user_id}", set key would be user_id_visitor and score
the created timestamp.
In MySQL I'd create the table with PK on user_id + user_id_visitor and
create an index on user_id + created
In C* i use an MV.

Is this the most efficient approach?
I also could have done this without an MV but then the situation in our app
would be far more complex.
I know that denormalization is a common pattern in C* and I don't hesitate
to use it but in this case, it is not as simple as it's not an append-only
case but updates have to be handled correctly.
If it is the first visit of a user, it's that simple, just 2 inserts in
base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
visit has to be deleted from the denormalized table before. Otherwise the
visit would not be unique any more.
Handling this case without an MV requires a lot more effort, I guess even
more effort than just using an MV.
1. You need kind of app-side locking to deal with race conditions
2. Read before write is required to determine if an old record has to be
deleted
3. At least CL_QUORUM is required to make sure that read before write is
always consistent
4. Old record has to be deleted on update

I guess, using an MV here is more efficient as there is less roundtrip
between C* and the app to do all that and the MV does not require strong
consistency as MV updates are always local and are eventual consistent when
the base table is. So there is also no need for distributed locks.

I ask all this as we now use CS 3.x and have been advised that 3.x is still
not considered really production ready.

I guess in a perfect world, this wouldn't even require an MV if SASI
indexes could be created over more than 1 column. E.g. in MySQL this case
is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
by Partition Key (which should to be done anyway) and sorting by a field
would perfectly do the trick. But from the docs, this is not possible right
now.

Does anyone see a better solution or are all my assumptions correct?

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Rationale for using Hazelcast in front of Cassandra?

2016-10-07 Thread Benjamin Roth

@Peter: Thanks for that comment! Thats pretty much what I thought when
reading the phrase why not to use CS as a cache.

Thoughts to sth in front of sth else:
If your real world case requires more performance, one option is always to
add a cache in front of it. How much overall gain you have from it
completely depends on your model, your infrastructure, your services (CS,
Memcache, Hazelcast, whatsoever), your demands on availability,
consistency, latency and so on.

There is no wrong or false.

Maybe you get 50% better performance with a Memcache in front of CS, or you
just use ScyllaDB and throw the memcache away. Or Memcache is not fail-safe
enough or your cache needs replication, then you maybe need sth like
Hazelcast.
Can your App-Model deal with Caches and its invalidation? Or will stale
caches be a problem in your app?

These are question that should drive a decision. But at the end, every
single case is different and has to be benchmarked and analyzed separately.

2016-10-07 17:28 GMT+02:00 Peter Lin <wool...@gmail.com>:

>
> Cassandra is a database, not an in-memory cache. Please don't abuse
> Cassandra like that when there's plenty of existing distributed cache
> products designed for that purpose.
>
> That's like asking "why can't I drag race with a school bus?"
>
> You could and it might be fun, but that's not what it was designed for.
>
> On Fri, Oct 7, 2016 at 11:22 AM, KARR, DAVID <dk0...@att.com> wrote:
>
>> No, I haven’t “thought why people don’t use Cassandra as a cache”, that’s
>> why I’m asking this here.  I’m asking the community for their POV when it
>> might make sense to front Cassandra with Hazelcast.  This is even mentioned
>> as a use case in the Hazelcast documentation (“As a front layer for a
>> Cassandra back-end”), and I’m aware of at least one large private
>> enterprise that does this.
>>
>>
>>
>> *From:* Dorian Hoxha [mailto:dorian.ho...@gmail.com]
>> *Sent:* Friday, October 07, 2016 3:48 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Rationale for using Hazelcast in front of Cassandra?
>>
>>
>>
>> Primary-key select is pretty fast in rdbms too and they also have caches.
>> By "close to" you mean in latency ?
>>
>> Have you thought why people don't use cassandra as a cache ? While it
>> doesn't have LRU, it has TTL,replicatio,sharding.
>>
>>
>>
>> On Fri, Oct 7, 2016 at 12:00 AM, KARR, DAVID <dk0...@att.com> wrote:
>>
>> Clearly, with “traditional” RDBMSs, you tend to put a cache “close to”
>> the client.  However, I was under the impression that Cassandra nodes could
>> be positioned “close to” their clients, and Cassandra has its own cache (I
>> believe), so how effective would it be to put a cache in front of a cache?
>>
>>
>>
>> *From:* Dorian Hoxha [mailto:dorian.ho...@gmail.com]
>> *Sent:* Thursday, October 06, 2016 2:52 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Rationale for using Hazelcast in front of Cassandra?
>>
>>
>>
>> Maybe when you can have very hot keys that can give trouble to your
>> 3(replication) cassandra nodes ?
>>
>> Example: why does facebook use memcache ? They certainly have things
>> distributed on thousands of servers.
>>
>>
>>
>> On Thu, Oct 6, 2016 at 11:40 PM, KARR, DAVID <dk0...@att.com> wrote:
>>
>> I've seen use cases that briefly describe using Hazelcast as a
>> "front-end" for Cassandra, perhaps as a cache.  This seems counterintuitive
>> to me.  Can someone describe to me when this kind of architecture might
>> make sense?
>>
>>
>>
>>
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth

There are several ports for several services. They are all set in
cassandra.yaml

See here for complete documentation:
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

2016-10-04 16:54 GMT+02:00 Mehdi Bada <mehdi.b...@dbi-services.com>:

> Hi all,
>
> What is the listen port parameter for Apache Cassandra? Does it exist?
> In comparison with MongoDB, in mongo it's possible to set the listen port
> in the mongod.conf (configuration file)
>
> Regards
> Mehdi
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth

I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
on my own and decided to go with the latest code.
Would you consider 3.0.x to be more stable than 3.x?

2016-10-04 18:14 GMT+02:00 Vladimir Yudovin <vla...@winguzone.com>:

> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> <benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to be
> deleted
> 3. At least CL_QUORUM is required to make sure that read before write is
> always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the app to do all that and the MV does not require strong
> consistency as MV updates are always local and are eventual consistent when
> the base table is. So there is also no need for distributed locks.
>
> I ask all this as we now use CS 3.x and have been advised that 3.x is
> still not considered really production ready.
>
> I guess in a perfect world, this wouldn't even require an MV if SASI
> indexes could be created over more than 1 column. E.g. in MySQL this case
> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
> by Partition Key (which should to be done anyway) and sorting by a field
> would perfectly do the trick. But from the docs, this is not possible right
> now.
>
> Does anyone see a better solution or are all my assumptions correct?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth

I have the impression, that not the tick-tock is the real problem but MVs
are not really battle-tested yet.
Depending on the model, they put much more complexity on a cluster and it's
behaviour under heavy load. Especially if you are going to create an MV
with a different partition key than the base table this might be a shot in
the head.
At least I was able to bring my cluster down many times just by throwing a
few queries too much at it or by running some big repairs with reaper.
Only since some days, things seem to go smoothly after having struggled
about 2 months with very different kind of issues.

We'll see ... most probably I will stick with the latest version. After all
it seems to work ok, I gained a lot of experience in running and
troubleshooting and to deal with bugs and maybe I am so able to contribute
a bit to further development.

2016-10-04 18:26 GMT+02:00 Vladimir Yudovin <vla...@winguzone.com>:

> >Would you consider 3.0.x to be more stable than 3.x?
> I guess yes, but there are some discussion on this list:
>
> (C)* stable version after 3.5
> <https://lists.apache.org/thread.html/4e4e67175efd1207965eb528e098f35dd268fba0f66632924d8bd0a2@%3Cuser.cassandra.apache.org%3E>
> Upgrade from 3.0.6 to 3.7.
> <https://lists.apache.org/thread.html/14e383fabe1ea1750a3e3eec78a1490ae27484ed2d3424d9c3aeb9e2@%3Cuser.cassandra.apache.org%3E>
>
> It seems to be eternal topic till tick-tock approach stabilizes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
> Roth<benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>
> I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
> on my own and decided to go with the latest code.
> Would you consider 3.0.x to be more stable than 3.x?
>
> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin <vla...@winguzone.com>:
>
> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> <benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind o

Re: dtests jolokia fails to attach

2016-10-06 Thread Benjamin Roth

That did the trick! Thanks. I also recognized that one down in the
contribution.md
I guess it would be also a good idea to show this notice if jolokia fails
to attach.

Thanks guys!

2016-10-06 14:12 GMT+02:00 Marcus Eriksson <krum...@gmail.com>:

> It is this: "-XX:+PerfDisableSharedMem" - in your dtest you need to do
> "remove_perf_disable_shared_mem(node1)" before starting the node
>
> /Marcus
>
> On Thu, Oct 6, 2016 at 8:30 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Maybe additional information, this is the CS command line for ccm node1:
>>
>> br   20376  3.2  8.6 2331136 708308 pts/5  Sl   06:10   0:30 java
>> -Xloggc:/home/br/.ccm/test/node1/logs/gc.log -ea
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103
>> -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB
>> -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true
>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
>> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled
>> -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xms500M -Xmx500M -Xmn50M
>> -XX:+UseCondCardMark -XX:CompileCommandFile=/home/b
>> r/.ccm/test/node1/conf/hotspot_compiler 
>> -javaagent:/home/br/repos/cassandra/lib/jamm-0.3.0.jar
>> -Dcassandra.jmx.local.port=7100 
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password
>> -Djava.library.path=/home/br/repos/cassandra/lib/sigar-bin
>> -Dcassandra.migration_task_wait_in_seconds=6
>> -Dcassandra.libjemalloc=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
>> -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra
>> -Dcassandra.storagedir=/home/br/repos/cassandra/data
>> -Dcassandra-pidfile=/home/br/.ccm/test/node1/cassandra.pid -cp
>> /home/br/.ccm/test/node1/conf:/home/br/repos/cassandra/build
>> /classes/main:/home/br/repos/cassandra/build/classes/thrift
>> :/home/br/repos/cassandra/lib/HdrHistogram-2.1.9.jar:/home/
>> br/repos/cassandra/lib/ST4-4.0.8.jar:/home/br/repos/
>> cassandra/lib/airline-0.6.jar:/home/br/repos/cassandra/lib/
>> antlr-runtime-3.5.2.jar:/home/br/repos/cassandra/lib/asm-5.
>> 0.4.jar:/home/br/repos/cassandra/lib/caffeine-2.2.6.
>> jar:/home/br/repos/cassandra/lib/cassandra-driver-core-3.0.
>> 1-shaded.jar:/home/br/repos/cassandra/lib/commons-cli-1.1.ja
>> r:/home/br/repos/cassandra/lib/commons-codec-1.2.jar:/home/
>> br/repos/cassandra/lib/commons-lang3-3.1.jar:/home/br/repos/
>> cassandra/lib/commons-math3-3.2.jar:/home/br/repos/
>> cassandra/lib/compress-lzf-0.8.4.jar:/home/br/repos/
>> cassandra/lib/concurrent-trees-2.4.0.jar:/home/br/
>> repos/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/
>> home/br/repos/cassandra/lib/disruptor-3.0.1.jar:/home/br/
>> repos/cassandra/lib/ecj-4.4.2.jar:/home/br/repos/cassandra/
>> lib/guava-18.0.jar:/home/br/repos/cassandra/lib/high-
>> scale-lib-1.0.6.jar:/home/br/repos/cassandra/lib/hppc-0.5.
>> 4.jar:/home/br/repos/cassandra/lib/jackson-core-
>> asl-1.9.2.jar:/home/br/repos/cassandra/lib/jackson-mapper-
>> asl-1.9.2.jar:/home/br/repos/cassandra/lib/jamm-0.3.0.jar:/
>> home/br/repos/cassandra/lib/javax.inject.jar:/home/br/
>> repos/cassandra/lib/jbcrypt-0.3m.jar:/home/br/repos/
>> cassandra/lib/jcl-over-slf4j-1.7.7.jar:/home/br/repos/
>> cassandra/lib/jctools-core-1.2.1.jar:/home/br/repos/cassand
>> ra/lib/jflex-1.6.0.jar:/home/br/repos/cassandra/lib/jna-4.
>> 0.0.jar:/home/br/repos/cassandra/lib/joda-time-2.4.jar:/
>> home/br/repos/cassandra/lib/json-simple-1.1.jar:/home/br/
>> repos/cassandra/lib/libthrift-0.9.2.jar:/home/br/repos/
>> cassandra/lib/log4j-over-slf4j-1.7.7.jar:/home/br/repos
>> /cassandra/lib/logback-classic-1.1.3.jar:/home/br/repos/
>> cassandra/lib/logback-core-1.1.3.jar:/home/br/repos/cassand
>> ra/lib/lz4-1.3.0.jar:/home/br/repos/cassandra/lib/metrics-
>> core-3.1.0.jar:/home/br/repos/cassandra/lib/metrics-jvm-3.1.
>> 0.jar:/home/br/repos/cassandra/lib/metrics-logback-
>> 3.1.0.jar:/home/br/repos/cassandra/lib/netty-all-4.0.
>> 39.Final.jar:/home/br/repos/cassandra/l

Log traces of debug logs

2016-11-09 Thread Benjamin Roth

Hi!

Is there a way to tell logback to log the trace of a debug log? The
background is that i'd like to know from where a table flush is triggered.

Thanks guys!

Re: Log traces of debug logs

2016-11-09 Thread Benjamin Roth

I don't want to change the log level. I want to add a trace to the log entry

Am 09.11.2016 15:22 schrieb "Vladimir Yudovin" <vla...@winguzone.com>:

> Hi,
>
> you can change log level with nodetool setlogginglevel command
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Wed, 09 Nov 2016 10:17:37 -0500*Benjamin Roth
> <benjamin.r...@jaumo.com <benjamin.r...@jaumo.com>>* wrote 
>
> Hi!
>
> Is there a way to tell logback to log the trace of a debug log? The
> background is that i'd like to know from where a table flush is triggered.
>
> Thanks guys!
>
>
>

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth

This is the reason why One would like to use an mv for it. An mv Handels
this. It adds a clustering Key while preserving uniqueness of the original
pk.

Am 11.11.2016 02:33 schrieb "Gang Liu" :

> I guess orignal design is keep one record for one video per user. maybe
> their app will report many play records when user watching one video.
> So there will be many records when change primary key to (user_name,
> last_time). Also
> SELECT * FROM user_views WHERE user_name = ? LIMIT 10
> without group by video_id can't fit business requirement.
>
> regards,
> Gang
>
>
> On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso  wrote:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin 
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> >* wrote 
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> >* wrote 
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>> Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth

Yes sorry. Was irritated by the fact that Video id wasn't. Anyway probably
an mv could be a way to go.

Am 10.11.2016 13:38 schrieb "Carlos Alonso" <i...@mrcalonso.com>:

Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 10 November 2016 at 11:50, Benjamin Roth <benjamin.r...@jaumo.com> wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso <i...@mrcalonso.com>:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vla...@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond@outlook.com <diamond@outlook.com>>* wrote 
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin <vla...@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond@outlook.com <diamond@outlook.com>>* wrote 
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>> Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin <vla...@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth

est entry by index:
>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>> video_id='great video';
>>
>> or by value, if you know the oldest entry:
>>
>> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
>> user' AND video_id='great video';
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>  On Mon, 07 Nov 2016 21:54:08 -0500*ben ben <diamond@outlook.com
>> <diamond@outlook.com>>* wrote 
>>
>>
>> Hi guys,
>>
>> We are maintaining a system for an on-line video service. ALL users'
>> viewing records of every movie are stored in C*. So she/he can continue to
>> enjoy the movie from the last point next time. The table is designed as
>> below:
>> CREATE TABLE recent (
>> user_name text,
>> vedio_id text,
>> position int,
>> last_time timestamp,
>> PRIMARY KEY (user_name, vedio_id)
>> )
>>
>> It worked well before. However, the records increase every day and the
>> last ten items may be adequate for the business. The current model use
>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>> business prefer to order by the last_time desc. If we use last_time as
>> cluster key, there will be many records for a singe movie and the recent
>> one is actually desired. So how to model that? Do you have any suggestions?
>> Thanks!
>>
>>
>> BRs,
>> BEN
>>
>>
>>
>>
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: large number of pending compactions, sstables steadily increasing

2016-11-07 Thread Benjamin Roth

;> > has fixed the issue in the past, but most recently I was getting OOM
>> errors,
>> > probably due to the large number of sstables. I upgraded to 2.2.7 and
>> am no
>> > longer getting OOM errors, but also it does not resolve the issue. I do
>> see
>> > this message in the logs:
>> >
>> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
>> >> CompactionManager.java:610 - Cannot perform a full major compaction as
>> >> repaired and unrepaired sstables cannot be compacted together. These
>> two set
>> >> of sstables will be compacted separately.
>> >
>> > Below are the 'nodetool tablestats' comparing a normal and the
>> problematic
>> > node. You can see problematic node has many many more sstables, and
>> they are
>> > all in level 1. What is the best way to fix this? Can I just delete
>> those
>> > sstables somehow then run a repair?
>> >>
>> >> Normal node
>> >>>
>> >>> keyspace: mykeyspace
>> >>>
>> >>> Read Count: 0
>> >>>
>> >>> Read Latency: NaN ms.
>> >>>
>> >>> Write Count: 31905656
>> >>>
>> >>> Write Latency: 0.051713177939359714 ms.
>> >>>
>> >>> Pending Flushes: 0
>> >>>
>> >>> Table: mytable
>> >>>
>> >>> SSTable count: 1908
>> >>>
>> >>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000,
>> 306, 0,
>> >>> 0, 0, 0]
>> >>>
>> >>> Space used (live): 301894591442
>> >>>
>> >>> Space used (total): 301894591442
>> >>>
>> >>>
>> >>>
>> >>> Problematic node
>> >>>
>> >>> Keyspace: mykeyspace
>> >>>
>> >>> Read Count: 0
>> >>>
>> >>> Read Latency: NaN ms.
>> >>>
>> >>> Write Count: 30520190
>> >>>
>> >>> Write Latency: 0.05171286705620116 ms.
>> >>>
>> >>> Pending Flushes: 0
>> >>>
>> >>> Table: mytable
>> >>>
>> >>> SSTable count: 14105
>> >>>
>> >>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0,
>> 0,
>> >>> 0, 0]
>> >>>
>> >>> Space used (live): 561143255289
>> >>>
>> >>> Space used (total): 561143255289
>> >
>> > Thanks,
>> >
>> > Ezra
>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>>
>>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Priority for cassandra nodes in cluster

2016-11-12 Thread Benjamin Roth

1. From a 15 year experience of running distributed Services: dont Mix
Services on machines if you don't have to. Dedicate each server to a single
task if you can afford it. It is easier to manage and reduces risks in case
of overload or failure
2. You can assign a different number of tokens for each node by setting
this in Cassandra.yaml before you bootstrap that node

Am 12.11.2016 22:48 schrieb "sat" :

> Hi,
>
> We are planning to install 3 node cluster in production environment. Is it
> possible to provide weightage or priority to the nodes in cluster.
>
> Eg., We want more more records to be written to first 2 nodes and less to
> the 3rd node. We are thinking of this approach because we want to install
> other IO intensive messaging server in the 3rd node, in order to reduce the
> load we are requesting for this approach.
>
>
> Thanks and Regards
> A.SathishKumar
>
>

Re: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Benjamin Roth

There is cassandra-stress to benchmark your cluster.

See docs here:
https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsCStress.html?hl=stress

2016-11-23 9:09 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi Benjamin,
>
>
>
> I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.
> My table is as below:
>
>
>
> CREATE TABLE XXX_YY_MMS (
>
> date timestamp,
>
> userid text,
>
> time timestamp,
>
> xid text,
>
> addimid text,
>
> advcid bigint,
>
> algo bigint,
>
> alla text,
>
> aud text,
>
> bmid text,
>
> ctyid text,
>
> bid double,
>
> ctxid text,
>
> devipid text,
>
> gmid text,
>
> ip text,
>
> itcid bigint,
>
> iid text,
>
> metid bigint,
>
> osdid text,
>
> paid int,
>
> position text,
>
> pcid bigint,
>
> refurl text,
>
> sec text,
>
> siid bigint,
>
> tmpid bigint,
>
> xforwardedfor text,
>
> PRIMARY KEY (date, userid, time, xid)
>
> ) WITH CLUSTERING ORDER BY (userid ASC, time ASC, xid ASC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
> SizeTieredCompactionStrategy'}
>
> AND compression = {'sstable_compression': 'org.apache.cassandra.io.
> compress.LZ4Compressor'}
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99.0PERCENTILE';
>
>
>
> So please let me know what I miss?
>
>
>
> And for this hardware below config is fine?
>
>
>
> concurrent_reads: 32
>
> concurrent_writes: 64
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 32
>
> concurrent_compactors: 8
>
>
>
> thanks,
>
> Abhishek
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Wednesday, November 23, 2016 12:56 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra Config as per server hardware for heavy write
>
>
>
> This is ridiculously slow for that hardware setup. Sounds like you
> benchmark with a single thread and / or sync queries or very large writes.
>
> A setup like this should be easily able to handle tens of thousands of
> writes / s
>
>
>
> 2016-11-23 8:02 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:
>
> How are you benchmarking that?
>
> On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari <
> abhishek.maheshw...@timesinternet.in> wrote:
>
> Hi,
>
>
>
> I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40
> Cores and 8 SSD. Currently I have below config in Cassandra.yaml:
>
>
>
> concurrent_reads: 32
>
> concurrent_writes: 64
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 32
>
> concurrent_compactors: 8
>
>
>
> With this configuration, I can write 1700 Request/Sec per server.
>
>
>
> But our desired write performance is 3000-4000 Request/Sec per server. As
> per my Understanding Max value for these parameters can be as below:
>
> concurrent_reads: 32
>
> concurrent_writes: 128(8*16 Corew)
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 128
>
> concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for
> this)
>
>
>
> Please let me know this is fine or I need to tune some other parameters
> for speedup write.
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <%2B91-%C2%A0805591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> Education gets Exciting with IIM Kozhikode Executive Post Graduate
> Programme in Management - 2 years (AMBA accredited with full benefits of
> IIMK Alumni status). Brought to you by IIMK in association with TSW, an
> Executive Education initiative from The Times of India Group. Learn more:
> www.timestsw.com
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Cassandra Config as per server hardware for heavy write

2016-11-22 Thread Benjamin Roth

This is ridiculously slow for that hardware setup. Sounds like you
benchmark with a single thread and / or sync queries or very large writes.
A setup like this should be easily able to handle tens of thousands of
writes / s

2016-11-23 8:02 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

> How are you benchmarking that?
> On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari <
> abhishek.maheshw...@timesinternet.in> wrote:
>
>> Hi,
>>
>>
>>
>> I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and
>> 40 Cores and 8 SSD. Currently I have below config in Cassandra.yaml:
>>
>>
>>
>> concurrent_reads: 32
>>
>> concurrent_writes: 64
>>
>> concurrent_counter_writes: 32
>>
>> compaction_throughput_mb_per_sec: 32
>>
>> concurrent_compactors: 8
>>
>>
>>
>> With this configuration, I can write 1700 Request/Sec per server.
>>
>>
>>
>> But our desired write performance is 3000-4000 Request/Sec per server. As
>> per my Understanding Max value for these parameters can be as below:
>>
>> concurrent_reads: 32
>>
>> concurrent_writes: 128(8*16 Corew)
>>
>> concurrent_counter_writes: 32
>>
>> compaction_throughput_mb_per_sec: 128
>>
>> concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for
>> this)
>>
>>
>>
>> Please let me know this is fine or I need to tune some other parameters
>> for speedup write.
>>
>>
>>
>>
>>
>> *Thanks & Regards,*
>> *Abhishek Kumar Maheshwari*
>> *+91- 805591 <%2B91-%C2%A0805591> (Mobile)*
>>
>> Times Internet Ltd. | A Times of India Group Company
>>
>> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>>
>> *P** Please do not print this email unless it is absolutely necessary.
>> Spread environmental awareness.*
>>
>>
>> Education gets Exciting with IIM Kozhikode Executive Post Graduate
>> Programme in Management - 2 years (AMBA accredited with full benefits of
>> IIMK Alumni status). Brought to you by IIMK in association with TSW, an
>> Executive Education initiative from The Times of India Group. Learn more:
>> www.timestsw.com
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth

:617)
~[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,731
StorageService.java:1483 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
~[apache-cassandra-3.10.jar:3.10]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
[guava-18.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
[apache-cassandra-3.10.jar:3.10]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251)
[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_102]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]

netstat output of 10.23.71.6, the node mentioned above in the debug.log,
obviously all files + bytes have been transferred
==
Mode: NORMAL
Bootstrap b998aec0-b2fd-11e6-a63d-75828fa8d45c
/10.23.71.8
Sending 1598 files, 60610896516 bytes total. Already sent 1598 files,
60610896516 bytes total
/var/lib/cassandra/data/log/log_fake-b130c05070e611e6986e29a4f0eae2e7/mc-97218-big-Data.db
29425392/29425392 bytes(100%) sent to idx:0/10.23.71.8
...

nodetool info on new node says
==
ID : 9dedcc9a-d951-4c7a-b794-434db1af960f
Gossip active  : true
Thrift active  : true
Native Transport active: true
Load   : 185.2 GiB
Generation No  : 1480071319
Uptime (seconds)   : 25082
Heap Memory (MB)   : 3607.60 / 15974.44
Off Heap Memory (MB)   : 405.70
Data Center: DC1
Rack   : RAC1
Exceptions : 0
Key Cache  : entries 1312994, size 100 MiB, capacity 100 MiB,
126413327 hits, 162672698 requests, 0.777 recent hit rate, 14400 save
period in seconds
Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits,
0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits,
0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache: entries 7680, size 480 MiB, capacity 480 MiB,
8277584 misses, 147262566 requests, 0.944 recent hit rate, 835.412
microseconds miss latency
Percent Repaired   : 4.232661592656687%
Token  : (node is not joined to the cluster)

Any idea whats going wrong?

Same situation was when I bootstrapped a node last time. In the end I just
started the node with auto_bootstrap=false to get it up and running and I
ran repair afterwards. I'd like to avoid that repair and all the
inconsistencies this time.

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth

I proposed a quite simple fix for
https://issues.apache.org/jira/browse/CASSANDRA-12905

Sorry that I don't supply a patch. I am good at analysing code but totally
unexperienced with the workflows here.

2016-11-25 19:57 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:

> Yes, I have MVs.
>
> Interesting is also that in the middle of bootstrapping (cannot tell when
> exactly) it seemed like other nodes started to send hints to the
> bootstrapping node. When that happened, it seems that every single HintVerb
> fails also with a WTE. At least the logs are completely flooded with WTE.
> When I paused hints on all other nodes, logs were quiet again.
>
> I completely restarted the bootstrap (deleted /var/lib/cassandra) - this
> time with hints paused from the beginning. We will see if that changes
> anything.
>
> I find it also quite weird that other nodes have hints for a bootstrapping
> node. Is that intended behaviour?
> And is it possible that streaming locks the whole CF? I looked like
> absolutely no hint could be delivered successfully.
>
> 2016-11-25 19:43 GMT+01:00 Paulo Motta <pauloricard...@gmail.com>:
>
>> If you have an MV table It seems you're hitting https://issues.apache.
>> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical
>> since it can prevent or difficult bootstrap.
>>
>> Did you try resuming bootstrap with "nodetool bootstrap resume" after the
>> failure? It may eventually succeed, since this is an MV lock contention
>> problem.
>>
>> 2016-11-25 15:59 GMT-02:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>>
>>> Hi!
>>>
>>> Today I wanted a new node to join the cluster.
>>> When looking at netstats on all the old nodes, it seemed like the
>>> streaming sessions did complete.
>>> They all said that all files have been transferred. But looking at the
>>> debug.log the stream sessions finished with an error.
>>> Also after all streams have been done the node remains in state
>>> "JOINING".
>>>
>>> See logs:
>>>
>>> debug.log, last words
>>> 
>>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
>>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Streaming error occurred on session with peer 10.23.71.6
>>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
>>> out - received only 0 responses.
>>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.
>>> 10]
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> [na:1.8.0_102]
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> [na:1.8.0_102]
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_102]
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_102]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
>>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Sending Session Failed
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Finishing keep-alive task.
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Closing stream connection handler on /10.23.71.6
>>> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>>> Session with /10.23.71.6 is complete
>>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>>> StreamCoordinator.java:146 - Finished connecting all sessions
>>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
>>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e

Re: Bootstrap fails on 3.10

2016-11-25 Thread Benjamin Roth

Yes, I have MVs.

Interesting is also that in the middle of bootstrapping (cannot tell when
exactly) it seemed like other nodes started to send hints to the
bootstrapping node. When that happened, it seems that every single HintVerb
fails also with a WTE. At least the logs are completely flooded with WTE.
When I paused hints on all other nodes, logs were quiet again.

I completely restarted the bootstrap (deleted /var/lib/cassandra) - this
time with hints paused from the beginning. We will see if that changes
anything.

I find it also quite weird that other nodes have hints for a bootstrapping
node. Is that intended behaviour?
And is it possible that streaming locks the whole CF? I looked like
absolutely no hint could be delivered successfully.

2016-11-25 19:43 GMT+01:00 Paulo Motta <pauloricard...@gmail.com>:

> If you have an MV table It seems you're hitting https://issues.apache.
> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical
> since it can prevent or difficult bootstrap.
>
> Did you try resuming bootstrap with "nodetool bootstrap resume" after the
> failure? It may eventually succeed, since this is an MV lock contention
> problem.
>
> 2016-11-25 15:59 GMT-02:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>
>> Hi!
>>
>> Today I wanted a new node to join the cluster.
>> When looking at netstats on all the old nodes, it seemed like the
>> streaming sessions did complete.
>> They all said that all files have been transferred. But looking at the
>> debug.log the stream sessions finished with an error.
>> Also after all streams have been done the node remains in state "JOINING".
>>
>> See logs:
>>
>> debug.log, last words
>> 
>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Streaming error occurred on session with peer 10.23.71.6
>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
>> out - received only 0 responses.
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.
>> 10]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_102]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_102]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Sending Session Failed
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Finishing keep-alive task.
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Closing stream connection handler on /10.23.71.6
>> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Session with /10.23.71.6 is complete
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamCoordinator.java:146 - Finished connecting all sessions
>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Stream failed
>> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error
>> while waiting on bootstrap to complete. Bootstrap will have to be restarted.
>> java.util.concurrent.ExecutionException: 
>> org.apache.cassandra.streaming.StreamException:
>> Stream failed
>> at com.google.common.util.concurrent.AbstractFuture$Sync.
>> getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>> ~[guava-18.0.jar:na]
>> at 
>> com.g

Re: Java GC pauses, reality check

2016-11-25 Thread Benjamin Roth

Lol. The counter proof is to use another memory Model like Arc. Thats why i
personally think Java is NOT the First choice for Server Applications. But
thats a philosophic discussion.

Am 25.11.2016 23:38 schrieb "Kant Kodali" :

> +1 Chris Lohfink response
>
> I would also restate the following sentence "java GC pauses are pretty
> much a fact of life" to "Any GC based system pauses are pretty much a
> fact of life".
>
> I would be more than happy to see if someone can counter prove.
>
>
>
> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink 
> wrote:
>
>> No tuning will eliminate gcs.
>>
>> 20-30 seconds is horrific and out of the ordinary. Most likely
>> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
>> with some workloads still may require some tuning to maintain. Some
>> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
>> partitions).
>>
>> Chris
>>
>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed  wrote:
>>
>>> Hello!
>>>
>>> From what I understand java GC pauses are pretty much a fact of life,
>>> but you can tune the jvm to reduce the likelihood of the frequency and
>>> length of GC pauses.
>>>
>>> When using Cassandra, how frequent or long have these pauses known to
>>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>>
>>> Would a 20-30 second pause be something out of the ordinary?
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Java GC pauses, reality check

2016-11-25 Thread Benjamin Roth

Thanks!

But getting back to the original issue:
I think the GC itself is not the root cause for such a long pause. I
remember having had issues with 1 minute GCs in the beginning. I also tried
around with larger and smaller heap sizes and different GCs (G1, CMS),
different settings but what helped in the end (as far as I remember -
please nail me down on that) was to increase memtable flush writers.
I could explain it like that:
If available mem is getting fuller and fuller, the GC has to ran more often
and longer to reclaim the last available bit that is currently required.
Memtables use a considerable amount of it and if they can't be flushed in
time, they grow and use more and more memory putting more and more pressure
on the GC - also known as the GC death spiral. In my case I never ran in an
OOM crash but the node became totally unresponsive.

I don't tell this must be the case here but it is one possible case.

P.S.: In my case memtable flush writers default was 2 AFAIR as I had only
one SSD but the node could easily handle many more with 8 real cores and an
SSD.

2016-11-26 7:52 GMT+01:00 Work <jrother...@codojo.me>:

> I'm not affiliated with them, I've just been impressed by them. They have
> done amazing work in performance measurement. They discovered a major flaw
> in most performance testing ... I've never seen their pricing. But,
> recently, they made their product available for testing by developers. And
> the assured me that pricing is on a sliding scale depending upon
> utilization, and not ridiculous.
>
> - James
>
> Sent from my iPhone
>
> On Nov 25, 2016, at 10:40 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> This sounds amazing but also expensive - I don't see pricing on their
> page. Are you able and allowed to tell a rough pricing range?
>
> Am 26.11.2016 04:33 schrieb "Harikrishnan Pillai" <hpil...@walmartlabs.com
> >:
>
>> We are running azul zing in prod with 1 million reads/s and 100 K
>> writes/s with azul .we never had a major gc above 10 ms .
>>
>> Sent from my iPhone
>>
>> > On Nov 25, 2016, at 3:49 PM, Martin Schröder <mar...@oneiros.de> wrote:
>> >
>> > 2016-11-25 23:38 GMT+01:00 Kant Kodali <k...@peernova.com>:
>> >> I would also restate the following sentence "java GC pauses are pretty
>> much
>> >> a fact of life" to "Any GC based system pauses are pretty much a fact
>> of
>> >> life".
>> >>
>> >> I would be more than happy to see if someone can counter prove.
>> >
>> > Azul disagrees.
>> > https://www.azul.com/products/zing/pgc/
>> >
>> > Best
>> >   Martin
>>
>

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Java GC pauses, reality check

2016-11-25 Thread Benjamin Roth

This sounds amazing but also expensive - I don't see pricing on their page.
Are you able and allowed to tell a rough pricing range?

Am 26.11.2016 04:33 schrieb "Harikrishnan Pillai" :

> We are running azul zing in prod with 1 million reads/s and 100 K writes/s
> with azul .we never had a major gc above 10 ms .
>
> Sent from my iPhone
>
> > On Nov 25, 2016, at 3:49 PM, Martin Schröder  wrote:
> >
> > 2016-11-25 23:38 GMT+01:00 Kant Kodali :
> >> I would also restate the following sentence "java GC pauses are pretty
> much
> >> a fact of life" to "Any GC based system pauses are pretty much a fact of
> >> life".
> >>
> >> I would be more than happy to see if someone can counter prove.
> >
> > Azul disagrees.
> > https://www.azul.com/products/zing/pgc/
> >
> > Best
> >   Martin
>

Re: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Benjamin Roth

01301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com]
> *Sent:* Wednesday, November 23, 2016 2:23 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra Config as per server hardware for heavy write
>
>
>
> Hi Abhishek,
>
> You could check whether you are throttling on client side queries or on
> cassandra side.
>
> You could also use grafana to monitor the cluster as well.
>
> As you said, you are using 100 threads, it can't be sure whether you are
> throttling cassandra cluster to its max limit.
>
>
>
> As Benjamin suggested, you could use cassandra stress tool.
>
>
>
> Lastly, if after everything( and you are sure, that cassandra seems slow)
> the TPS comes out to be the numbers as you suggested, you could check you
> schema, many rows in one partition key, read queries, read write load,
> write queries with Batch/LWT, compactions running etc.
>
>
>
>
>
> For checking ONLY cassandra throughput, you could use cassandra-stress
> with any schema of your choice.
>
>
>
> Regards
>
>
>
>
>
> On Wed, Nov 23, 2016 at 2:07 PM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
> So do you see speed write saturation at this number of thread? Does
> doubling to 200 bring increase?
>
>
>
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting,
> Zero production time*
>
>
>
>
>
>  On Wed, 23 Nov 2016 03:31:32 -0500*Abhishek Kumar Maheshwari
> <abhishek.maheshw...@timesinternet.in
> <abhishek.maheshw...@timesinternet.in>>* wrote 
>
>
>
> No I am using 100 threads.
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <%2B91-%C2%A0805591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Vladimir Yudovin [mailto:vla...@winguzone.com]
> *Sent:* Wednesday, November 23, 2016 2:00 PM
> *To:* user <user@cassandra.apache.org>
> *Subject:* RE: Cassandra Config as per server hardware for heavy write
>
>
>
> >I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.
>
> Is your Java program single threaded?
>
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting,
> Zero production time*
>
>
>
>
>
>  On Wed, 23 Nov 2016 03:09:29 -0500*Abhishek Kumar Maheshwari
> <abhishek.maheshw...@timesinternet.in
> <abhishek.maheshw...@timesinternet.in>>* wrote 
>
>
>
> Hi Benjamin,
>
>
>
> I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.
> My table is as below:
>
>
>
> CREATE TABLE XXX_YY_MMS (
>
> date timestamp,
>
> userid text,
>
> time timestamp,
>
> xid text,
>
> addimid text,
>
> advcid bigint,
>
> algo bigint,
>
> alla text,
>
> aud text,
>
> bmid text,
>
> ctyid text,
>
> bid double,
>
> ctxid text,
>
> devipid text,
>
> gmid text,
>
> ip text,
>
> itcid bigint,
>
> iid text,
>
> metid bigint,
>
> osdid text,
>
> paid int,
>
> position text,
>
> pcid bigint,
>
> refurl text,
>
> sec text,
>
> siid bigint,
>
> tmpid bigint,
>
> xforwardedfor text,
>
> PRIMARY KEY (date, userid, time, xid)
>
> ) WITH CLUSTERING ORDER BY (userid ASC, time ASC, xid ASC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
> SizeTieredCompactionStrategy'}
>
> AND compression = {'sstable_compression': 'org.apache.cassandra.io.
> compress.LZ4Compressor'}
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99.0PERCENTILE';
>
>
>
> So please let me

Re: repair -pr in crontab

2016-11-24 Thread Benjamin Roth

I recommend using cassandra-reaper
Using crons without proper Monitoring will most  likely not work as
expected.
There are some reaper forks on GitHub. You have to check which one works
with your Cassandra version. The original one from Spotify only works on
2.x not on 3.x

Am 25.11.2016 07:31 schrieb "wxn...@zjqunshuo.com" :

> Hi Artur,
> When I asked similar questions, someone addressed me to the below links
> and they are helpful.
>
> See http://www.datastax.com/dev/blog/repair-in-cassandra
> https://lostechies.com/ryansvihla/2015/09/25/cassandras-repair-should-be-
> called-required-maintenance/
> https://cassandra-zone.com/understanding-repairs/
>
> Cheers,
> -Simon
>
> *From:* Artur Siekielski 
> *Date:* 2016-11-10 04:22
> *To:* user 
> *Subject:* repair -pr in crontab
> Hi,
> the docs give me an impression that repairing should be run manually,
> and not put in crontab for default. Should each repair run be monitored
> manually?
>
> If I would like to put "repair -pr" in crontab for each node, with a few
> hour difference between the runs, are there any risks with such setup?
> Specifically:
> - if two or more "repair -pr" runs on different nodes are running at the
> same time, can it cause any problems besides high load?
> - can "repair -pr" be run simultaneously on all nodes at the same time?
> - I'm using the default gc_grace_period of 10 days. Are there any
> reasons to run repairing more often that once per 10 days, for a case
> when previous repairing fails?
> - how to monitor start and finish times of repairs, and if the runs were
> successful? Does the "nodetool repair" command is guaranteed to exit
> only after the repair is finished and does it return a status code to a
> shell?
>
>

Re: repair -pr in crontab

2016-11-25 Thread Benjamin Roth

It is absolutely ok to run parallel repair -pr, if
1. the ranges do not overlap
2. if your cluster can handle the pressure - do not underestimate that.

In reaper you can tweak some settings like repair intensity to give your
cluster some time to breath between repair slices.

2016-11-25 11:34 GMT+01:00 Artur Siekielski <a...@vhex.net>:

> Hi,
> yes, I read about how the repairing works, but the docs/blog posts lack
> practical recommendations and "best practices". For example, I found people
> having issues with running "repair -pr" simultaneously on all nodes, but it
> isn't clear it shouldn't be allowed.
>
> In the end I implemented rolling, sequential "repair -pr" run on all nodes
> (it's pretty easy to implement when you have Salt/Ansible, or even ssh).
>
>
> On 11/25/2016 07:30 AM, wxn...@zjqunshuo.com wrote:
>
>> Hi Artur,
>> When I asked similar questions, someone addressed me to the below links
>> and they are helpful.
>>
>> See http://www.datastax.com/dev/blog/repair-in-cassandra
>> https://lostechies.com/ryansvihla/2015/09/25/cassandras-
>> repair-should-be-called-required-maintenance/
>> https://cassandra-zone.com/understanding-repairs/
>>
>> Cheers,
>> -Simon
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Java GC pauses, reality check

2016-11-27 Thread Benjamin Roth

Maybe I was not totally clear. Reference counting is of course done at
runtime but the compiler automates where + when to do the counting.
Before, the developer had to retain + release objects manually. Since ARC,
this is done by the compiler at file level.
Nothing is "free" in this world. There are also drawbacks on it. But there
is indeed no GC like in Java (at least not in Clang). Cycles have to be
avoided by the developer.
See here https://en.wikipedia.org/wiki/Automatic_Reference_Counting

2016-11-27 15:28 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

> Reference counting happens at run time, not compile time. It's not free
> either. Every time a reference is added, there's overhead in tracking it.
> It also doesn't catch cycles. You still need garbage collection to avoid
> memory leaks.
>
> On Sun, Nov 27, 2016 at 12:31 AM Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Arc means Automatic Reference counting which is done at compilen time. Eg
>> Objektive c and Swift use this technique. There are absolutely No gc's. Its
>> a completely different memory Management technique.
>>
>> Why i dont like Java on Server side? Because gc is a pain in the ass. I
>> am doing this Business since over 15 years and running/maintaining Apps
>> that are build in c or c++ has never been such a pain.
>>
>> On the other Hand Java is easier to handle for Developers. And coding
>> plain c is also a pain.
>>
>> Thats why i Said its a philosophic discussion.
>> Anyway Cassandra rund on Java so We have to Deal with it.
>>
>> Am 27.11.2016 05:28 schrieb "Kant Kodali" <k...@peernova.com>:
>>
>> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
>> completely I mean no GC pauses whatsoever.
>>
>> When you say Java is NOT the First choice for Server Applications you
>> are generalizing it too much I would say since many of them fall under that
>> category. Either way the statement you made is purely subjective.
>>
>> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>> Lol. The counter proof is to use another memory Model like Arc. Thats why
>> i personally think Java is NOT the First choice for Server Applications.
>> But thats a philosophic discussion.
>>
>> Am 25.11.2016 23:38 schrieb "Kant Kodali" <k...@peernova.com>:
>>
>> +1 Chris Lohfink response
>>
>> I would also restate the following sentence "java GC pauses are pretty
>> much a fact of life" to "Any GC based system pauses are pretty much a
>> fact of life".
>>
>> I would be more than happy to see if someone can counter prove.
>>
>>
>>
>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink <clohfin...@gmail.com>
>> wrote:
>>
>> No tuning will eliminate gcs.
>>
>> 20-30 seconds is horrific and out of the ordinary. Most likely
>> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
>> with some workloads still may require some tuning to maintain. Some
>> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
>> partitions).
>>
>> Chris
>>
>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed <sahmed1...@gmail.com> wrote:
>>
>> Hello!
>>
>> From what I understand java GC pauses are pretty much a fact of life, but
>> you can tune the jvm to reduce the likelihood of the frequency and length
>> of GC pauses.
>>
>> When using Cassandra, how frequent or long have these pauses known to
>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>
>> Would a 20-30 second pause be something out of the ordinary?
>>
>> Thanks.
>>
>>
>>
>>
>>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: What is the size of each Virtual Node token range?

2016-11-28 Thread Benjamin Roth

A token does not identify a row. A token is a hash value of the partition
key and the hash can have 2^64 different values. A collision is a normal
thing in a hash table and it just means that different rows with the same
token simply go to the same (v-)node, just like if they were different but
in the same token range.
You could even compare this to the typical implementation of a hash table
in C, Java, Perl, whatever. A hashtable is a kind of a sparse array with
the hash key as index and a linked list (or more complex implementations)
as value where a list of all entries with the same hash values are stored.
This simply makes it fast to find an entry by key without looping through
all the list entries and comparing them with a key you are looking for.

This thesis is maybe more correct:
There can be no more than 2^64 nodes in a cluster as then 2 nodes would
share exactly the same token and this does not make really sense.

2016-11-28 17:28 GMT+01:00 Kant Kodali <k...@peernova.com>:

>
> 1) What is the size of each Virtual Node token range?
> 2) Are all Vnode token ranges in one server are of the same size?
> 3) If these token ranges are predefined then isn't it implying that the
> maximum total number of rows in a server is also predefined?
>
> maximum total number of rows in a server = num_tokens_in _vnode_1 +
> num_tokens_in _vnode_2 + num_tokens_in _vnode_3 + +
> num_tokens_in _vnode_256
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Java GC pauses, reality check

2016-11-26 Thread Benjamin Roth

Arc means Automatic Reference counting which is done at compilen time. Eg
Objektive c and Swift use this technique. There are absolutely No gc's. Its
a completely different memory Management technique.

Why i dont like Java on Server side? Because gc is a pain in the ass. I am
doing this Business since over 15 years and running/maintaining Apps that
are build in c or c++ has never been such a pain.

On the other Hand Java is easier to handle for Developers. And coding plain
c is also a pain.

Thats why i Said its a philosophic discussion.
Anyway Cassandra rund on Java so We have to Deal with it.

Am 27.11.2016 05:28 schrieb "Kant Kodali" <k...@peernova.com>:

> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
> completely I mean no GC pauses whatsoever.
>
> When you say Java is NOT the First choice for Server Applications you are
> generalizing it too much I would say since many of them fall under that
> category. Either way the statement you made is purely subjective.
>
> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Lol. The counter proof is to use another memory Model like Arc. Thats why
>> i personally think Java is NOT the First choice for Server Applications.
>> But thats a philosophic discussion.
>>
>> Am 25.11.2016 23:38 schrieb "Kant Kodali" <k...@peernova.com>:
>>
>>> +1 Chris Lohfink response
>>>
>>> I would also restate the following sentence "java GC pauses are pretty
>>> much a fact of life" to "Any GC based system pauses are pretty much a
>>> fact of life".
>>>
>>> I would be more than happy to see if someone can counter prove.
>>>
>>>
>>>
>>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink <clohfin...@gmail.com>
>>> wrote:
>>>
>>>> No tuning will eliminate gcs.
>>>>
>>>> 20-30 seconds is horrific and out of the ordinary. Most likely
>>>> implementing antipatterns and/or poorly configured. Sub 1s is realistic but
>>>> with some workloads still may require some tuning to maintain. Some
>>>> workloads are very unfriendly to GCs though (ie heavy tombstones, very wide
>>>> partitions).
>>>>
>>>> Chris
>>>>
>>>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed <sahmed1...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> From what I understand java GC pauses are pretty much a fact of life,
>>>>> but you can tune the jvm to reduce the likelihood of the frequency and
>>>>> length of GC pauses.
>>>>>
>>>>> When using Cassandra, how frequent or long have these pauses known to
>>>>> be?  Even with tuning, is it safe to assume they cannot be eliminated?
>>>>>
>>>>> Would a 20-30 second pause be something out of the ordinary?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>

Re: Java GC pauses, reality check

2016-11-26 Thread Benjamin Roth

You are of course right. There is no solution and no language that is a
perfect match for every situation and every solution and language has it's
own pros, cons, pitfalls and drawbacks.
Actually that article you posted points at some aspect of ARC, I wasn't
aware of, yet.
Nevertheless, GC is an issue for Cassandra, otherwise this thread would not
exist, right? But we have to deal with it and get the best out of it.

Another option, besides optimizing your GC: You could check if
http://www.scylladb.com/ is an option for you.
They rewrote CS from the scratch. The goal is to be completely compatible
with CS but to be much, much faster. Check their benchmarks and their
architecture.
I really do not want do depreciate the work of all the Cassandra Developers
- they did a great job - but what I have seen there looked very interesting
and promising! By the way it's written in C++.


2016-11-27 7:06 GMT+01:00 Kant Kodali <k...@peernova.com>:

> Automatic Reference counting sounds like college level idea that we all
> have been hearing for since GC is born! There seem to be bunch of cons of
> ARC as explained here
>
> https://www.quora.com/Why-doesnt-Apple-Swift-adopt-the-
> memory-management-method-of-garbage-collection-like-in-Java
>
> Maintaining C and C++ APPS are never a pain? How about versioning and
> static time libraries? There is work there too. so its all pros and cons
>
> "gc is a pain in the ass". How about seg faults? they aren't any lesser
> pain :)
>
> Not only Cassandra that runs on JVM. Majority of Apache projects do run on
> JVM for a reason.
>
> Bottom line. My point here is there are pros and cons of every language.
> It doesn't make much sense to target one language.
>
>
>
>
>
>
> On Sat, Nov 26, 2016 at 9:31 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Arc means Automatic Reference counting which is done at compilen time. Eg
>> Objektive c and Swift use this technique. There are absolutely No gc's. Its
>> a completely different memory Management technique.
>>
>> Why i dont like Java on Server side? Because gc is a pain in the ass. I
>> am doing this Business since over 15 years and running/maintaining Apps
>> that are build in c or c++ has never been such a pain.
>>
>> On the other Hand Java is easier to handle for Developers. And coding
>> plain c is also a pain.
>>
>> Thats why i Said its a philosophic discussion.
>> Anyway Cassandra rund on Java so We have to Deal with it.
>>
>> Am 27.11.2016 05:28 schrieb "Kant Kodali" <k...@peernova.com>:
>>
>>> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
>>> completely I mean no GC pauses whatsoever.
>>>
>>> When you say Java is NOT the First choice for Server Applications you
>>> are generalizing it too much I would say since many of them fall under that
>>> category. Either way the statement you made is purely subjective.
>>>
>>> On Fri, Nov 25, 2016 at 2:41 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> Lol. The counter proof is to use another memory Model like Arc. Thats
>>>> why i personally think Java is NOT the First choice for Server
>>>> Applications. But thats a philosophic discussion.
>>>>
>>>> Am 25.11.2016 23:38 schrieb "Kant Kodali" <k...@peernova.com>:
>>>>
>>>>> +1 Chris Lohfink response
>>>>>
>>>>> I would also restate the following sentence "java GC pauses are
>>>>> pretty much a fact of life" to "Any GC based system pauses are pretty
>>>>> much a fact of life".
>>>>>
>>>>> I would be more than happy to see if someone can counter prove.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 25, 2016 at 1:41 PM, Chris Lohfink <clohfin...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> No tuning will eliminate gcs.
>>>>>>
>>>>>> 20-30 seconds is horrific and out of the ordinary. Most likely
>>>>>> implementing antipatterns and/or poorly configured. Sub 1s is realistic 
>>>>>> but
>>>>>> with some workloads still may require some tuning to maintain. Some
>>>>>> workloads are very unfriendly to GCs though (ie heavy tombstones, very 
>>>>>> wide
>>>>>> partitions).
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On Fri, Nov 25, 2016 at 3:25 PM, S Ahmed <sahmed1...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> From what I understand java GC pauses are pretty much a fact of
>>>>>>> life, but you can tune the jvm to reduce the likelihood of the frequency
>>>>>>> and length of GC pauses.
>>>>>>>
>>>>>>> When using Cassandra, how frequent or long have these pauses known
>>>>>>> to be?  Even with tuning, is it safe to assume they cannot be 
>>>>>>> eliminated?
>>>>>>>
>>>>>>> Would a 20-30 second pause be something out of the ordinary?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Storing videos in cassandra

2016-11-20 Thread Benjamin Roth

I haven't had the chance to test Pithos but from an architectural view it
MUST be slower than a haystack-like architecture (
https://code.facebook.com/posts/685565858139515/needle-in-a-haystack-efficient-storage-of-billions-of-photos/).
This is why we decided to move to SeaweedFS (
https://github.com/chrislusf/seaweedfs). It still has a central component
(master server) but it is so slim and well designed that you really need to
hit a Facebook-like scale to push it to the limits. Haystack is purely
designed for simplicity + speed.
If you need more functionality, like authentication, permission management,
multi-tenancy, whatever, it probably will not be the architecture of your
choice. Actually there are really a lot of solutions for a Blob-Store. Some
use Cassandra, many of them have different approaches, every one with its
own pros and cons for certain use cases.

2016-11-20 10:53 GMT+01:00 DuyHai Doan <doanduy...@gmail.com>:

> No idea, just contact them
>
> On Sun, Nov 20, 2016 at 5:45 AM, vvshvv <vvs...@gmail.com> wrote:
>
>> Hi Doan,
>>
>> Is there any performance test of Pithos?
>>
>>
>>
>> Sent from my Mi phone
>> On DuyHai Doan <doanduy...@gmail.com>, Nov 19, 2016 6:46 PM wrote:
>>
>> There is a project Pithos that stores blob in Cassandra and exposes them
>> via S3 compatible API:
>>
>> https://www.exoscale.ch/syslog/2016/08/15/object-storage-
>> cassandra-pithos/
>>
>> On Sat, Nov 19, 2016 at 1:36 PM, Kai Wang <dep...@gmail.com> wrote:
>>
>>> IIRC, I watched a presentation where they said Netflix store almost
>>> everything in C* *except* video content and payment stuff.
>>>
>>> That was 1-2 years ago. Not sure if it's still the case.
>>>
>>> On Nov 14, 2016 12:03 PM, "raghavendra vutti" <
>>> raghu9raghaven...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>  Just wanted to know How does hulu or netflix store videos in cassandra.
>>>>
>>>> Do they just use references to the video files in the form of URL's and
>>>> store in the DB??
>>>>
>>>> could someone please me on this.
>>>>
>>>>
>>>> Thanks,
>>>> Raghavendra.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Storing videos in cassandra

2016-11-14 Thread Benjamin Roth

Some time ago, I stumbled across this:
https://github.com/chrislusf/seaweedfs
It is an open source implementation of Facebooks Haystack design. Have no
experience yet but we will evaluate it as a blob-store to replace our
Mogile-FS installation which stores over one billion images. From my point
of view it looks very promising and probably much more resource-friendly
for this use case.

Maybe that helps ...

2016-11-14 19:52 GMT+01:00 Jon Haddad <jonathan.had...@gmail.com>:

> While Cassandra *can* be used this way, I don’t recommend it.  It’s going
> to be far cheaper and easier to maintain to store data in an Object store
> like S3, like Oskar recommended.
>
> > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote:
> >
> > We store videos and files in Cassandra by chunking them into small
> portions and saving them as blobs.  As for video you could track the file
> byte offset of each chunk and request the relevant pieces when scrubbing to
> a particular portion of the video.
> >
> >> On Nov 14, 2016, at 11:02 AM, raghavendra vutti <
> raghu9raghaven...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Just wanted to know How does hulu or netflix store videos in cassandra.
> >>
> >> Do they just use references to the video files in the form of URL's and
> store in the DB??
> >>
> >> could someone please me on this.
> >>
> >>
> >> Thanks,
> >> Raghavendra.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Introducing Cassandra 3.7 LTS

2016-11-02 Thread Benjamin Roth

You can build one on your own very easily. Just check out the desired git
repo and do this:

http://stackoverflow.com/questions/8989192/how-to-package-the-cassandra-source-code-into-debian-package

2016-11-02 17:35 GMT+01:00 Jesse Hodges <hodges.je...@gmail.com>:

> Just curious, has anybody created a debian package for this?
>
> Thanks, Jesse
>
> On Sat, Oct 22, 2016 at 7:45 PM, Kai Wang <dep...@gmail.com> wrote:
>
>> This is awesome! Stability is the king.
>>
>> Thank you so much!
>>
>> On Oct 19, 2016 2:56 PM, "Ben Bromhead" <b...@instaclustr.com> wrote:
>>
>>> Hi All
>>>
>>> I am proud to announce we are making available our production build of
>>> Cassandra 3.7 that we run at Instaclustr (both for ourselves and our
>>> customers). Our release of Cassandra 3.7 includes a number of backported
>>> patches from later versions of Cassandra e.g. 3.8 and 3.9 but doesn't
>>> include the new features of these releases.
>>>
>>> You can find our release of Cassandra 3.7 LTS on github here (
>>> https://github.com/instaclustr/cassandra). You can read more of our
>>> thinking and how this applies to our managed service here (
>>> https://www.instaclustr.com/blog/2016/10/19/patched-cassandra-3-7/).
>>>
>>> We also have an expanded FAQ about why and how we are approaching 3.x in
>>> this manner (https://github.com/instaclustr/cassandra#cassandra-37-lts),
>>> however I've included the top few question and answers below:
>>>
>>> *Is this a fork?*
>>> No, This is just Cassandra with a different release cadence for those
>>> who want 3.x features but are slightly more risk averse than the current
>>> schedule allows.
>>>
>>> *Why not just use the official release?*
>>> With the 3.x tick-tock branch we have encountered more instability than
>>> with the previous release cadence. We feel that releasing new features
>>> every other release makes it very hard for operators to stabilize their
>>> production environment without bringing in brand new features that are not
>>> battle tested. With the release of Cassandra 3.8 and 3.9 simultaneously the
>>> bug fix branch included new and real-world untested features, specifically
>>> CDC. We have decided to stick with Cassandra 3.7 and instead backport
>>> critical issues and maintain it ourselves rather than trying to stick with
>>> the current Apache Cassandra release cadence.
>>>
>>> *Why backport?*
>>> At Instaclustr we support and run a number of different versions of
>>> Apache Cassandra on behalf of our customers. Over the course of managing
>>> Cassandra for our customers we often encounter bugs. There are existing
>>> patches for some of them, others we patch ourselves. Generally, if we can,
>>> we try to wait for the next official Apache Cassandra release, however in
>>> the need to ensure our customers remain stable and running we will
>>> sometimes backport bugs and write our own hotfixes (which are also
>>> submitted back to the community).
>>>
>>> *Why release it?*
>>> A number of our customers and people in the community have asked if we
>>> would make this available, which we are more than happy to do so. This
>>> repository represents what Instaclustr runs in production for Cassandra 3.7
>>> and this is our way of helping the community get a similar level of
>>> stability as what you would get from our managed service.
>>>
>>> Cheers
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> Ben Bromhead
>>> CTO | Instaclustr <https://www.instaclustr.com/>
>>> +1 650 284 9692
>>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: commit log on NFS volume

2016-11-01 Thread Benjamin Roth

Using nfs for a distribited System like Cassandra is like putting a Ferrari
on a Truck and going for a Race with the Truck. It is simply nonsense.

Am 01.11.2016 19:39 schrieb "Vladimir Yudovin" :

> Hi,
>
> it's not only performance issue. In case of network problem writer tread
> can be blocked, also in case of failure loss of data can occur.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Tue, 01 Nov 2016 14:10:10 -0400*John Sanda  >* wrote 
>
> I know that using NFS is discouraged, particularly for the commit log. Can
> anyone shed some light into what kinds of problems I might encounter aside
> from performance? The reason for my inquiry is because I have some
> deployments with Cassandra 2.2.1 that use NFS and are experiencing some
> problems like reoccurring corrupted commit log segments on start up:
>
> ERROR 19:38:42 Exiting due to error while processing commit log during 
> initialization.
> 
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Mutation checksum failure at 33296351 in CommitLog-5-1474325237114.log
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622)
>  [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:492)
>  [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:388)
>  [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147)
>  [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) 
> [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) 
> [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:266) 
> [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488)
>  [apache-cassandra-2.2.1.jar]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595) 
> [apache-cassandra-2.2.1.jar]
>
>
> In one deployment after removing all of corrupted commit log segments I
> got a different error:
>
> Exception (java.lang.RuntimeException) encountered during startup: 
> java.nio.file.NoSuchFileException: 
> /cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
> java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> /cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
>   at 
> org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:55)
>   at 
> org.apache.cassandra.io.util.ChannelProxy.(ChannelProxy.java:66)
>   at 
> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:78)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:91)
>   at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:101)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:216)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595)
> Caused by: java.nio.file.NoSuchFileException: 
> /cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:51)
>   ... 8 more
>
>
> The latter error looks like it involves compaction and might be unrelated.
> I don't know if it matters, but I have commit log compression enabled in
> these environments.
>
> --
>
> - John
>
>
>

Re: commit log on NFS volume

2016-11-01 Thread Benjamin Roth

The Second exception states there is a File of an sstable Missing. Is it
possible you didnt only delete commit logs or nfs Mount is stale or Not
mounted?

Am 01.11.2016 19:52 schrieb "John Sanda" <john.sa...@gmail.com>:

> Using nfs for a distribited System like Cassandra is like putting a
>> Ferrari on a Truck and going for a Race with the Truck. It is simply
>> nonsense.
>
>
> As I mentioned in my original post, I am aware that using NFS is
> considered bad and even documented as an anti-pattern. Your analogy,
> interesting as it may be, is not helpful. It simply restating what has
> already been said. I don't even know that NFS is to blame for the
> CommitLogReplayException that I cited.
>
> On Tue, Nov 1, 2016 at 2:43 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Using nfs for a distribited System like Cassandra is like putting a
>> Ferrari on a Truck and going for a Race with the Truck. It is simply
>> nonsense.
>>
>> Am 01.11.2016 19:39 schrieb "Vladimir Yudovin" <vla...@winguzone.com>:
>>
>>> Hi,
>>>
>>> it's not only performance issue. In case of network problem writer tread
>>> can be blocked, also in case of failure loss of data can occur.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 01 Nov 2016 14:10:10 -0400*John Sanda
>>> <john.sa...@gmail.com <john.sa...@gmail.com>>* wrote 
>>>
>>> I know that using NFS is discouraged, particularly for the commit log.
>>> Can anyone shed some light into what kinds of problems I might encounter
>>> aside from performance? The reason for my inquiry is because I have some
>>> deployments with Cassandra 2.2.1 that use NFS and are experiencing some
>>> problems like reoccurring corrupted commit log segments on start up:
>>>
>>> ERROR 19:38:42 Exiting due to error while processing commit log during 
>>> initialization.
>>> 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>>  Mutation checksum failure at 33296351 in CommitLog-5-1474325237114.log
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:492)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:388)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) 
>>> [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) 
>>> [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:266)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488)
>>>  [apache-cassandra-2.2.1.jar]
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595) 
>>> [apache-cassandra-2.2.1.jar]
>>>
>>>
>>> In one deployment after removing all of corrupted commit log segments I
>>> got a different error:
>>>
>>> Exception (java.lang.RuntimeException) encountered during startup: 
>>> java.nio.file.NoSuchFileException: 
>>> /cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
>>> java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
>>> /cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
>>> at 
>>> org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:55)
>>> at 
>>> org.apache.cassandra.io.util.ChannelProxy.(ChannelProxy.java:66)
>>> at 
>>> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:78)
>>> at 
>>> org.apache.

Re: which one of the following choices is more efficient?

2016-10-26 Thread Benjamin Roth

If you have 2 tables that share the same PK and have few fields and most of
the rows have values for all (or many fields), merging them could save you
some space as for each table, each PK has to be stored in both tables.
But I would avoid having "god tables" with too many fields.

But at the end again: It depends you model. Think thoroughly about it.

2016-10-26 10:17 GMT+02:00 Kant Kodali <k...@peernova.com>:

> I guess the question can be rephrased into "What is the overhead of
> creating and maintaining an additional table?"
>
> On Wed, Oct 26, 2016 at 1:12 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
>> Depends on the use case. No one right answer.
>>
>> On Wed, Oct 26, 2016 at 1:03 PM, Kant Kodali <k...@peernova.com> wrote:
>>
>>> If one were given a choice of fitting all the data into one table vs
>>> fitting the data into two tables while say (keeping all the runtime and
>>> space complexity for CRUD operations the same in either case)  which one
>>> would you choose and why?
>>>
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Benjamin Roth

Big Fat lol!!!

Am 01.11.2016 19:02 schrieb "Ali Akhtar" :

> ^ Stockholm syndrome :)
>
> On Tue, Nov 1, 2016 at 10:54 PM, Robert Wille  wrote:
>
>> I used to think it was terrible as well. But it really isn’t. Just put
>> your non-counter columns in a separate table with the same primary key. If
>> you want to query both counter and non-counter columns at the same time,
>> just query both tables at the same time with asynchronous queries.
>>
>> On Nov 1, 2016, at 7:29 AM, Ali Akhtar  wrote:
>>
>> That's a terrible gotcha rule.
>>
>> On Tue, Nov 1, 2016 at 6:27 PM, Cody Yancey  wrote:
>>
>>> In your table schema, you have KEYS and you have VALUES. Your KEYS are
>>> text, but they could be any non-counter type or compound thereof. KEYS
>>> obviously cannot ever be counters.
>>>
>>> Your VALUES, however, must be either all counters or all non-counters.
>>> The official example you posted conforms to this limitation.
>>>
>>> Thanks,
>>> Cody
>>>
>>> On Nov 1, 2016 7:16 AM, "Ali Akhtar"  wrote:
>>>
 I'm not referring to the primary key, just to other columns.

 My primary key is a text, and my table contains a mix of texts, ints,
 and timestamps.

 If I try to change one of the ints to a counter and run the create
 table query, I get the error ' Cannot mix counter and non counter
 columns in the same table'


 On Tue, Nov 1, 2016 at 6:11 PM, Cody Yancey  wrote:

> For counter tables, non-counter types are of course allowed in the
> primary key. Counters would be meaningless otherwise.
>
> Thanks,
> Cody
>
> On Nov 1, 2016 7:00 AM, "Ali Akhtar"  wrote:
>
>> In the documentation for counters:
>>
>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
>>
>> The example table is created via:
>>
>> CREATE TABLE counterks.page_view_counts
>>   (counter_value counter,
>>   url_name varchar,
>>   page_name varchar,
>>   PRIMARY KEY (url_name, page_name)
>> );
>>
>> Yet if I try to create a table with a mixture of texts, ints,
>> timestamps, and counters, i get the error ' Cannot mix counter and non
>> counter columns in the same table'
>>
>> Is that supposed to be allowed or not allowed, given that the
>> official example contains a mix of counters and non-counters?
>>
>

>>
>>
>

Re: Java GC pauses, reality check

2016-11-27 Thread Benjamin Roth

I didn't even know there are plans to move to TPC in Cs. Thanks for that
update. After all I will follow the development of both Scylla and Cs and
am excited about the future of both!

Am 27.11.2016 10:02 schrieb "Kant Kodali" <k...@peernova.com>:

> Yes I am well aware of Scyalldb. It might be well written in C++ but the
> performance gain they are claiming has very little to do with moving from
> Java to C++. They had major design changes such as moving away from SEDA to
> TPC and so on. Moreover I would say it still needs to mature. Lot of users
> had complained that they cannot get the benchmarks similar to the ones that
> are posted online and I keep seeing comments stating that you need to use a
> specific hardware and specific tuning mechanisms and so on (I don't mean to
> say what scylladb is claiming is wrong I certainly haven't verified it but
> I do know for the fact lot of people are having trouble to reach those
> benchmarks).
>
> SEDA to TPC is a very big change. Let's see how long it would take for
> Apache C*
>
> https://issues.apache.org/jira/browse/CASSANDRA-10989
>
>
>
>
> On Sat, Nov 26, 2016 at 11:45 PM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> You are of course right. There is no solution and no language that is a
>> perfect match for every situation and every solution and language has it's
>> own pros, cons, pitfalls and drawbacks.
>> Actually that article you posted points at some aspect of ARC, I wasn't
>> aware of, yet.
>> Nevertheless, GC is an issue for Cassandra, otherwise this thread would
>> not exist, right? But we have to deal with it and get the best out of it.
>>
>> Another option, besides optimizing your GC: You could check if
>> http://www.scylladb.com/ is an option for you.
>> They rewrote CS from the scratch. The goal is to be completely compatible
>> with CS but to be much, much faster. Check their benchmarks and their
>> architecture.
>> I really do not want do depreciate the work of all the Cassandra
>> Developers - they did a great job - but what I have seen there looked very
>> interesting and promising! By the way it's written in C++.
>>
>>
>> 2016-11-27 7:06 GMT+01:00 Kant Kodali <k...@peernova.com>:
>>
>>> Automatic Reference counting sounds like college level idea that we all
>>> have been hearing for since GC is born! There seem to be bunch of cons of
>>> ARC as explained here
>>>
>>> https://www.quora.com/Why-doesnt-Apple-Swift-adopt-the-memor
>>> y-management-method-of-garbage-collection-like-in-Java
>>>
>>> Maintaining C and C++ APPS are never a pain? How about versioning and
>>> static time libraries? There is work there too. so its all pros and cons
>>>
>>> "gc is a pain in the ass". How about seg faults? they aren't any lesser
>>> pain :)
>>>
>>> Not only Cassandra that runs on JVM. Majority of Apache projects do run
>>> on JVM for a reason.
>>>
>>> Bottom line. My point here is there are pros and cons of every language.
>>> It doesn't make much sense to target one language.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Nov 26, 2016 at 9:31 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> Arc means Automatic Reference counting which is done at compilen time.
>>>> Eg Objektive c and Swift use this technique. There are absolutely No gc's.
>>>> Its a completely different memory Management technique.
>>>>
>>>> Why i dont like Java on Server side? Because gc is a pain in the ass. I
>>>> am doing this Business since over 15 years and running/maintaining Apps
>>>> that are build in c or c++ has never been such a pain.
>>>>
>>>> On the other Hand Java is easier to handle for Developers. And coding
>>>> plain c is also a pain.
>>>>
>>>> Thats why i Said its a philosophic discussion.
>>>> Anyway Cassandra rund on Java so We have to Deal with it.
>>>>
>>>> Am 27.11.2016 05:28 schrieb "Kant Kodali" <k...@peernova.com>:
>>>>
>>>>> Benjamin Roth: How do you know Arc eliminates GC pauses completely? By
>>>>> completely I mean no GC pauses whatsoever.
>>>>>
>>>>> When you say Java is NOT the First choice for Server Applications you
>>>>> are generalizing it too much I would say since many of them fall under 
>>>>> that
>>>>> category. Either way the statement you made is purely

Re: Batch size warnings

2016-12-07 Thread Benjamin Roth

I meant the mv thing

Am 07.12.2016 17:27 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:

> Sure, about which part?
>
> default batch size warning is 5kb
> I've increased it to 30kb, and will need to increase to 40kb (8x default
> setting) to avoid WARN log messages about batch sizes.  I do realize it's
> just a WARNing, but may as well avoid those if I can configure it out.
> That said, having to increase it so substantially (and we're only dealing
> with 5 tables) is making me wonder if I'm not taking the correct approach
> in terms of using batches to guarantee atomicity.
>
> On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> Could you please be more specific?
>>
>> Am 07.12.2016 17:10 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>>
>>> Should've mentioned - running 3.9.  Also - please do not recommend MVs:
>>> I tried, they're broken, we punted.
>>>
>>> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <voytek.jar...@gmail.com>
>>> wrote:
>>>
>>>> The low default value for batch_size_warn_threshold_in_kb is making me
>>>> wonder if I'm perhaps approaching the problem of atomicity in a non-ideal
>>>> fashion.
>>>>
>>>> With one data set duplicated/denormalized into 5 tables to support
>>>> queries, we use batches to ensure inserts make it to all or 0 tables.  This
>>>> works fine, but I've had to bump the warn threshold and fail threshold
>>>> substantially (8x higher for the warn threshold).  This - in turn - makes
>>>> me wonder, with a default setting so low, if I'm not solving this problem
>>>> in the canonical/standard way.
>>>>
>>>> Mostly just looking for confirmation that we're not unintentionally
>>>> doing something weird...
>>>>
>>>
>>>
>

Re: Batch size warnings

2016-12-07 Thread Benjamin Roth

Ok thanks. Im investingating a Lot. There will be some improvements coming
but cannot promise if it will solve All existing problems. We will see and
keep working on it.

Am 07.12.2016 17:58 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:

> Been about a month since I have up on it, but it was very much related to
> the stuff you're dealing with ... Basically Cassandra just stepping on its
> own er, tripping over its own feet streaming MVs.
>
> On Dec 7, 2016 10:45 AM, "Benjamin Roth" <benjamin.r...@jaumo.com> wrote:
>
>> I meant the mv thing
>>
>> Am 07.12.2016 17:27 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>>
>>> Sure, about which part?
>>>
>>> default batch size warning is 5kb
>>> I've increased it to 30kb, and will need to increase to 40kb (8x default
>>> setting) to avoid WARN log messages about batch sizes.  I do realize it's
>>> just a WARNing, but may as well avoid those if I can configure it out.
>>> That said, having to increase it so substantially (and we're only dealing
>>> with 5 tables) is making me wonder if I'm not taking the correct approach
>>> in terms of using batches to guarantee atomicity.
>>>
>>> On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> Could you please be more specific?
>>>>
>>>> Am 07.12.2016 17:10 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>:
>>>>
>>>>> Should've mentioned - running 3.9.  Also - please do not recommend
>>>>> MVs: I tried, they're broken, we punted.
>>>>>
>>>>> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot <
>>>>> voytek.jar...@gmail.com> wrote:
>>>>>
>>>>>> The low default value for batch_size_warn_threshold_in_kb is making
>>>>>> me wonder if I'm perhaps approaching the problem of atomicity in a
>>>>>> non-ideal fashion.
>>>>>>
>>>>>> With one data set duplicated/denormalized into 5 tables to support
>>>>>> queries, we use batches to ensure inserts make it to all or 0 tables.  
>>>>>> This
>>>>>> works fine, but I've had to bump the warn threshold and fail threshold
>>>>>> substantially (8x higher for the warn threshold).  This - in turn - makes
>>>>>> me wonder, with a default setting so low, if I'm not solving this problem
>>>>>> in the canonical/standard way.
>>>>>>
>>>>>> Mostly just looking for confirmation that we're not unintentionally
>>>>>> doing something weird...
>>>>>>
>>>>>
>>>>>
>>>

Re: node decommission throttled

2016-12-08 Thread Benjamin Roth

Just an educated guess: you have materialized Views? They are known to
Stream very slow

Am 08.12.2016 10:28 schrieb "Aleksandr Ivanov" :

> Yes, I use compression.
> Tried without and it gave ~15% increase in speed, but is still too low
> (~35Mbps)
>
> On sending side no high CPU/IO/etc utilization.
> But on receiving node I see that one "STREAM-IN" thread takes 100% CPU and
> it just doesn't scale by design since "Each stream is a single thread" (
> http://www.mail-archive.com/user@cassandra.apache.org/msg42095.html)
>
>
>
>> > I'm trying to decommission one C* node from 6 nodes cluster and see that
>> > outbound network traffic on this node doesn't go over ~30Mb/s.
>> > Looks like it is throttled somewhere in C*
>>
>> Do you use compression?  Try taking a thread dump and see what the
>> utilization of the sending threads are.
>>
>>
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>>
>
>

Re: Are Materialized views persisted on disk?

2016-12-13 Thread Benjamin Roth

The word "materialized" implies that.

2016-12-13 20:34 GMT+01:00 Carl Yeksigian <c...@yeksigian.com>:

> Yes, they are stored on disk like a normal table.
>
> On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali <k...@peernova.com> wrote:
>
>> Are Materialized views persisted on disk? sorry for the naive question.
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Are Materialized views persisted on disk?

2016-12-13 Thread Benjamin Roth

It wasn't meant in a snarky way, it was as (too short) explanation. I try
to sum it up:

Materialized View:
The data that is represented by the view is stored persistently and updated
as soon as the underlying base data changes.
On RDBMS: Pro: Fast reads, Con: Slow(er) updates
On CS: Used to do filtering or sorting of the base table. Much slower write
path.

"Regular" View:
The base data is queried on demand. More or less a rewrite or alias of
another query.
On RDBMS: Pro: No updates required, Con: Probably slow reads, depending on
indexes.
On CS: Does not exist.

The term "materialized view" has been established by well known RDBMS like
oracle and behaves very similar in CS. In most RDBMS a view can have many
base tables. In CS an MV can have only one base table and has many more
restrictions compared to RDBMS.

2016-12-13 21:06 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

> People should be able to ask legit questions here without getting snarky
> answers, please don't do that.  Not everyone has the same background or
> knowledge that you do.
>
> On Tue, Dec 13, 2016 at 11:49 AM Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
>> The word "materialized" implies that.
>>
>> 2016-12-13 20:34 GMT+01:00 Carl Yeksigian <c...@yeksigian.com>:
>>
>> Yes, they are stored on disk like a normal table.
>>
>> On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali <k...@peernova.com> wrote:
>>
>> Are Materialized views persisted on disk? sorry for the naive question.
>>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

1 2 >

1 - 100 of 175 matches

Mail list logo