Re: Modeling Master Tables in Cassandra

2016-02-12 Thread Carlos Alonso
Hi Hari.

I'd suggest having a customers table like this:

CREATE TABLE customers (
  customerid UUID,
  name VARCHAR,
  email VARCHAR,
  phonenr VARCHAR,
  PRIMARY KEY(name, email, phonenr)
).

This way your inserts could be INSERT INTO customers (customerid, ...)
VALUES (...) IF NOT EXISTS;
Afterwards, you can use your customerid in the dependent tables such as:

CREATE TABLE customeraction (
  customerid UUID,
  action VARCHAR,
  time TIMESTAMP,
  PRIMARY KEY(customerid, action, time)
  // Keys definition will, of course, depend on the access pattern.
)

Before wrapping up I'd like to suggest denormalising a little bit using
statics if possible.

In case you need to JOIN your customers with any of your dependent tables,
that will have to be done in application logic as Cassandra doesn't support
such feature. Instead you can denormalise using statics which will actually
almost not duplicate any data as the static is saved only once per
partition.

An example:

CREATE TABLE customeraction (
  customerid UUID,
  name VARCHAR STATIC,
  email VARCHAR STATIC,
  phonenr VARCHAR STATIC,
  action VARCHAR,
  time TIMESTAMP,
  PRIMARY KEY(customerid, action, time)
).

This way, you avoid client side joins.

Hope this helps!

Carlos Alonso | Software Engineer | @calonso 

On 12 February 2016 at 09:25, Harikrishnan A  wrote:

> Hello,
> I have a scenario where I need to create a customer master table in
> cassandra which has attributes like customerid, name, email, phonenr .etc ..
> What is the best way to model such table in cassandra keeping in mind that
> I will be using customer id to populate customer information from other
> application work flows.  While inserting , I need to make sure the customer
> profile doesn't exists in this table by verifying the combination for name
> + email + phonenr.  Unfortunately I can't  store the name, email, phonenr
> in some of the tables where I have association with customer data, instead
> those table stores only customer id.
>
> Thanks & Regards,
> Hari
>


Cassandra eats all cpu cores, high load average

2016-02-12 Thread Skvazh Roman
Hello!
We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 1.5 
TB 4000 PIOPS EBS drive.
Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
read requests drops of.
Only restart of this cassandra services helps.
Please advice.

One big table with wide rows. 600 Gb per node.
LZ4Compressor
LeveledCompaction

concurrent compactors: 4
compactor throughput: tried from 16 to 128
Concurrent_readers: from 16 to 32
Concurrent_writers: 128


https://gist.github.com/rskvazh/de916327779b98a437a6


 JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 19.35
 http://code.google.com/p/jvmtop

 Profiling PID 9256: org.apache.cassandra.service.CassandraDa

  95.73% ( 4.31s) google.common.collect.AbstractIterator.tryToComputeN()
   1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
   1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
   0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
   0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
   0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
   0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()


ttop:

2016-02-12T08:20:25.605+ Process summary
  process cpu=1565.15%
  application cpu=1314.48% (user=1354.48% sys=-40.00%)
  other: cpu=250.67%
  heap allocation rate 146mb/s
[000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
[000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
[000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
[000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
[000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
[000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
[000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
[000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
[000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
[000444] user=72.21% sys=-2.30% alloc= 0b/s - SharedPool-Worker-12
[000310] user=71.50% sys=-2.31% alloc= 0b/s - SharedPool-Worker-3
[000445] user=69.68% sys=-0.83% alloc= 0b/s - SharedPool-Worker-13
[000406] user=72.61% sys=-4.40% alloc= 0b/s - SharedPool-Worker-10
[000446] user=69.78% sys=-1.65% alloc= 0b/s - SharedPool-Worker-11
[000452] user=66.86% sys= 0.22% alloc= 0b/s - SharedPool-Worker-15
[000256] user=69.08% sys=-2.42% alloc= 0b/s - SharedPool-Worker-2
[004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
[004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
[010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
[000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
[012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
Connection(2673)-127.0.0.1
[000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
[000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
[000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
[000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
[000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
[000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
[000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
[000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
[000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17

Re: ORM layer for cassandra-java?

2016-02-12 Thread Atul Saroha
Thanks for the reply. We would go with solution 1.

One more thing, which might be a bug. We are using 4.0.1 version. And query
of solution 2  is not possible. c3 is cluster key. No option is visible for
this cluster key:

   1. we cannot use it in set (manager.dsl().update().fromBaseTable().)
   2. even after where clause (only available option is based on id "
   :manager.dsl().update().fromBaseTable().c4_Set("1").where().id_Eq(id)")
   in DSL.


@Entity(keyspace = "ks" , table="PrimeUser")

> public class PrimeUser {
>
> @PartitionKey
>
>> @Column("id")
>>
> private int id;
>
> @Column("c1")
> @Static
> private String c1;
>
> @Column("c2")
> @Static
> private Boolean c2;
>
>
> @Column("c3")
> @ClusteringColumn(1)
> private String c3;
>
> @Column("c4")
> private String c4;
>
> }
>
> Regards,


-
Atul Saroha
*Sr. Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Fri, Feb 12, 2016 at 3:01 PM, DuyHai Doan  wrote:

> "How could I achieve Delta Update through this ORM  where I want to
> inserting one row for id,c3,c4 columns only"
>
> 2 ways:
>
> 1. Create an empty PrimeUser entity with only id, c3 and c4 values. Use
>   manager
>.crud()
>.insert(entity)
>.withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS)
>.execute()
>
> This way, Achilles will only extract the 3 columns (id, c3, c4) and
> generates only INSERT INTO primeuser(id,c3,c4) VALUES(...) and skip
> inserting null values
>
>
> 2. Use the Update DSL
>
> manager
>.dsl()
>.update()
>.fromBaseTable()
>.c3_Set("")
>.c4_Set("")
>.where()
>.id_Eq(partitionKey)
>.execute()
>
>
> "Since Achilies does not have persistence context like hibernate does, to
> track what has beed updated in my java entity and update the change only
> though DynamicUpdate anotation."
>
>  This is made on-purpose. Having a proxy to intercept calls to setters and
> to track what has been updated would require a "read-before-write" e.g.
> load existing data first from Cassandra into the proxy. And
> read-before-write is an anti-pattern.
>
>  Optionally, one could create an empty proxy without read-before-write and
> intercept only data that have been added to the entity. And this is exactly
> what solution 1. does: create a new empty instance of PrimeUser, populate
> some values and user insert() with NOT_NULL_FIELDS insert strategy. The
> only difference is that Achilles creates INSERT statement instead of UPDATE.
>
>
>
>
>
> On Fri, Feb 12, 2016 at 9:02 AM, Atul Saroha 
> wrote:
>
>> Thanks Doan,
>>
>> We are now evaluating or nearly finalized to use Achilles.
>>
>> We are looking for one use case.
>> As I mentioned in above for static columns.
>>
>>> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>>>   id int,
>>>   c1 text STATIC,
>>>   c2 boolean STATIC,
>>>   c3 text,
>>>   c4 text,
>>>   PRIMARY KEY (id, c3)
>>>
>> );
>>>
>> How could I achieve Delta Update through this ORM  where I want to
>> inserting one row for id,c3,c4 columns only or updating one row for c4
>> column only against ( id,c3). If I use my entity PrimeUser.java and crud
>> insert method then it will insert all columns including static.
>> I think that there is only one way which is to use update method of
>> dsl/Query API. Since Achilies does not have persistence context like
>> hibernate does, to track what has beed updated in my java entity and update
>> the change only though DynamicUpdate anotation.
>> Or there is something I am missing here?.
>>
>> Thanks , reply will be highly appreciated
>>
>>
>>
>> -
>> Atul Saroha
>> *Sr. Software Engineer*
>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>
>> On Tue, Feb 9, 2016 at 8:32 PM, DuyHai Doan  wrote:
>>
>>> Look at Achilles and how it models Partition key & clustering columns:
>>>
>>>
>>> https://github.com/doanduyhai/Achilles/wiki/5-minutes-Tutorial#clustered-entities
>>>
>>>
>>>
>>> On Tue, Feb 9, 2016 at 12:48 PM, Atul Saroha 
>>> wrote:
>>>
 I know the most popular ORM api

1. Kundera :

 https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera
2. Hector (outdated- no longer in development)

 I am bit confuse to model this table into java domain entity structure

 CREATE TABLE IF NOT EXISTS  ks.PrimeUser(

Re: ORM layer for cassandra-java?

2016-02-12 Thread DuyHai Doan
"How could I achieve Delta Update through this ORM  where I want to
inserting one row for id,c3,c4 columns only"

2 ways:

1. Create an empty PrimeUser entity with only id, c3 and c4 values. Use
  manager
   .crud()
   .insert(entity)
   .withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS)
   .execute()

This way, Achilles will only extract the 3 columns (id, c3, c4) and
generates only INSERT INTO primeuser(id,c3,c4) VALUES(...) and skip
inserting null values


2. Use the Update DSL

manager
   .dsl()
   .update()
   .fromBaseTable()
   .c3_Set("")
   .c4_Set("")
   .where()
   .id_Eq(partitionKey)
   .execute()


"Since Achilies does not have persistence context like hibernate does, to
track what has beed updated in my java entity and update the change only
though DynamicUpdate anotation."

 This is made on-purpose. Having a proxy to intercept calls to setters and
to track what has been updated would require a "read-before-write" e.g.
load existing data first from Cassandra into the proxy. And
read-before-write is an anti-pattern.

 Optionally, one could create an empty proxy without read-before-write and
intercept only data that have been added to the entity. And this is exactly
what solution 1. does: create a new empty instance of PrimeUser, populate
some values and user insert() with NOT_NULL_FIELDS insert strategy. The
only difference is that Achilles creates INSERT statement instead of UPDATE.





On Fri, Feb 12, 2016 at 9:02 AM, Atul Saroha 
wrote:

> Thanks Doan,
>
> We are now evaluating or nearly finalized to use Achilles.
>
> We are looking for one use case.
> As I mentioned in above for static columns.
>
>> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>>   id int,
>>   c1 text STATIC,
>>   c2 boolean STATIC,
>>   c3 text,
>>   c4 text,
>>   PRIMARY KEY (id, c3)
>>
> );
>>
> How could I achieve Delta Update through this ORM  where I want to
> inserting one row for id,c3,c4 columns only or updating one row for c4
> column only against ( id,c3). If I use my entity PrimeUser.java and crud
> insert method then it will insert all columns including static.
> I think that there is only one way which is to use update method of
> dsl/Query API. Since Achilies does not have persistence context like
> hibernate does, to track what has beed updated in my java entity and update
> the change only though DynamicUpdate anotation.
> Or there is something I am missing here?.
>
> Thanks , reply will be highly appreciated
>
>
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Tue, Feb 9, 2016 at 8:32 PM, DuyHai Doan  wrote:
>
>> Look at Achilles and how it models Partition key & clustering columns:
>>
>>
>> https://github.com/doanduyhai/Achilles/wiki/5-minutes-Tutorial#clustered-entities
>>
>>
>>
>> On Tue, Feb 9, 2016 at 12:48 PM, Atul Saroha 
>> wrote:
>>
>>> I know the most popular ORM api
>>>
>>>1. Kundera :
>>>
>>> https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera
>>>2. Hector (outdated- no longer in development)
>>>
>>> I am bit confuse to model this table into java domain entity structure
>>>
>>> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>>>   id int,
>>>   c1 text STATIC,
>>>   c2 boolean STATIC,
>>>   c3 text,
>>>   c4 text,
>>>   PRIMARY KEY (id, c3)
>>> );
>>>
>>>
>>> One way is to create compound key based on id and c3 column as shown
>>> below.
>>>
 @Entity
 @Table(name="PrimeUser", schema="ks")
 public class PrimeUser
 {

 @EmbeddedId
 private CompoundKey key;

@Column
private String c1;
@Column
private String c2;
@Column
private String c4;
 }

 Here key has to be an Embeddable entity:

 @Embeddable
 public class CompoundKey
 {
 @Column private int id;
 @Column private String c1;
 }

 Then again when we fetch the data based on id only then c1 and c2 will
>>> be duplicated multiple times object, even though they are stored per "id"
>>> basis. c3 column is cluster key and its corresponding value is mapped to
>>> column c4. We avoid using map here as it will cause performance hit.
>>> Also he have use cases to fetch the data based on (b1,c3) both also.
>>>
>>> Is there any other ORM API which handle such scenario.
>>>
>>>
>>>
>>>
>>> -
>>> Atul Saroha
>>> *Sr. Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369

Re: ORM layer for cassandra-java?

2016-02-12 Thread Atul Saroha
sorry I was  my understanding issue of solution 2. Thanks for the solution

-
Atul Saroha
*Sr. Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Fri, Feb 12, 2016 at 3:39 PM, Atul Saroha 
wrote:

> Thanks for the reply. We would go with solution 1.
>
> One more thing, which might be a bug. We are using 4.0.1 version. And
> query of solution 2  is not possible. c3 is cluster key. No option is
> visible for this cluster key:
>
>1. we cannot use it in set (manager.dsl().update().fromBaseTable().option for c3>)
>2. even after where clause (only available option is based on id "
>:manager.dsl().update().fromBaseTable().c4_Set("1").where().id_Eq(id)")
>in DSL.
>
>
> @Entity(keyspace = "ks" , table="PrimeUser")
>
>> public class PrimeUser {
>>
>> @PartitionKey
>>
>>> @Column("id")
>>>
>> private int id;
>>
>> @Column("c1")
>> @Static
>> private String c1;
>>
>> @Column("c2")
>> @Static
>> private Boolean c2;
>>
>>
>> @Column("c3")
>> @ClusteringColumn(1)
>> private String c3;
>>
>> @Column("c4")
>> private String c4;
>>
>> }
>>
>> Regards,
>
>
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Fri, Feb 12, 2016 at 3:01 PM, DuyHai Doan  wrote:
>
>> "How could I achieve Delta Update through this ORM  where I want to
>> inserting one row for id,c3,c4 columns only"
>>
>> 2 ways:
>>
>> 1. Create an empty PrimeUser entity with only id, c3 and c4 values. Use
>>   manager
>>.crud()
>>.insert(entity)
>>.withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS)
>>.execute()
>>
>> This way, Achilles will only extract the 3 columns (id, c3, c4) and
>> generates only INSERT INTO primeuser(id,c3,c4) VALUES(...) and skip
>> inserting null values
>>
>>
>> 2. Use the Update DSL
>>
>> manager
>>.dsl()
>>.update()
>>.fromBaseTable()
>>.c3_Set("")
>>.c4_Set("")
>>.where()
>>.id_Eq(partitionKey)
>>.execute()
>>
>>
>> "Since Achilies does not have persistence context like hibernate does,
>> to track what has beed updated in my java entity and update the change only
>> though DynamicUpdate anotation."
>>
>>  This is made on-purpose. Having a proxy to intercept calls to setters
>> and to track what has been updated would require a "read-before-write" e.g.
>> load existing data first from Cassandra into the proxy. And
>> read-before-write is an anti-pattern.
>>
>>  Optionally, one could create an empty proxy without read-before-write
>> and intercept only data that have been added to the entity. And this is
>> exactly what solution 1. does: create a new empty instance of PrimeUser,
>> populate some values and user insert() with NOT_NULL_FIELDS insert
>> strategy. The only difference is that Achilles creates INSERT statement
>> instead of UPDATE.
>>
>>
>>
>>
>>
>> On Fri, Feb 12, 2016 at 9:02 AM, Atul Saroha 
>> wrote:
>>
>>> Thanks Doan,
>>>
>>> We are now evaluating or nearly finalized to use Achilles.
>>>
>>> We are looking for one use case.
>>> As I mentioned in above for static columns.
>>>
 CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
   id int,
   c1 text STATIC,
   c2 boolean STATIC,
   c3 text,
   c4 text,
   PRIMARY KEY (id, c3)

>>> );

>>> How could I achieve Delta Update through this ORM  where I want to
>>> inserting one row for id,c3,c4 columns only or updating one row for c4
>>> column only against ( id,c3). If I use my entity PrimeUser.java and crud
>>> insert method then it will insert all columns including static.
>>> I think that there is only one way which is to use update method of
>>> dsl/Query API. Since Achilies does not have persistence context like
>>> hibernate does, to track what has beed updated in my java entity and update
>>> the change only though DynamicUpdate anotation.
>>> Or there is something I am missing here?.
>>>
>>> Thanks , reply will be highly appreciated
>>>
>>>
>>>
>>> -
>>> Atul Saroha
>>> *Sr. Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>
>>> On Tue, Feb 9, 2016 at 8:32 PM, DuyHai Doan 

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Julien Anguenot
Hey, 

What about compactions count when that is happening?

   J.


> On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
> 
> Hello!
> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 1.5 
> TB 4000 PIOPS EBS drive.
> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
> read requests drops of.
> Only restart of this cassandra services helps.
> Please advice.
> 
> One big table with wide rows. 600 Gb per node.
> LZ4Compressor
> LeveledCompaction
> 
> concurrent compactors: 4
> compactor throughput: tried from 16 to 128
> Concurrent_readers: from 16 to 32
> Concurrent_writers: 128
> 
> 
> https://gist.github.com/rskvazh/de916327779b98a437a6
> 
> 
> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
> 19.35
> http://code.google.com/p/jvmtop
> 
> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
> 
>  95.73% ( 4.31s) 
> google.common.collect.AbstractIterator.tryToComputeN()
>   1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
>   1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
>   0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>   0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>   0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
>   0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
> 
> 
> ttop:
> 
> 2016-02-12T08:20:25.605+ Process summary
>  process cpu=1565.15%
>  application cpu=1314.48% (user=1354.48% sys=-40.00%)
>  other: cpu=250.67%
>  heap allocation rate 146mb/s
> [000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
> [000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
> [000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
> [000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
> [000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
> [000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
> [000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
> [000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
> [000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
> [000444] user=72.21% sys=-2.30% alloc= 0b/s - SharedPool-Worker-12
> [000310] user=71.50% sys=-2.31% alloc= 0b/s - SharedPool-Worker-3
> [000445] user=69.68% sys=-0.83% alloc= 0b/s - SharedPool-Worker-13
> [000406] user=72.61% sys=-4.40% alloc= 0b/s - SharedPool-Worker-10
> [000446] user=69.78% sys=-1.65% alloc= 0b/s - SharedPool-Worker-11
> [000452] user=66.86% sys= 0.22% alloc= 0b/s - SharedPool-Worker-15
> [000256] user=69.08% sys=-2.42% alloc= 0b/s - SharedPool-Worker-2
> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
> Connection(2673)-127.0.0.1
> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17



Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Skvazh Roman
After disabling binary, gossip, thrift node blocks on 16 read stages and 

[iadmin@ip-10-0-25-46 ~]$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 0   19587002 0  
   0
ReadStage16122722 825762 0  
   0
RequestResponseStage  0 0   14281567 0  
   0
ReadRepairStage   0 0  37390 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0114 0  
   0
GossipStage   0 0  93775 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  18523 0  
   0
ValidationExecutor0 0 18 0  
   0
MigrationStage0 0  6 0  
   0
AntiEntropyStage  0 0 60 0  
   0
PendingRangeCalculator0 0 89 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0   2489 0  
   0
MemtablePostFlush 0 0   2562 0  
   0
MemtableReclaimMemory 128   2461 0  
   0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

> On 12 Feb 2016, at 17:45, Skvazh Roman  wrote:
> 
> There is 1-4 compactions at that moment.
> We have many tombstones, which does not removed.
> DroppableTombstoneRatio is 5-6 (greater than 1)
> 
>> On 12 Feb 2016, at 15:53, Julien Anguenot  wrote:
>> 
>> Hey, 
>> 
>> What about compactions count when that is happening?
>> 
>>  J.
>> 
>> 
>>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
>>> 
>>> Hello!
>>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
>>> 1.5 TB 4000 PIOPS EBS drive.
>>> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
>>> read requests drops of.
>>> Only restart of this cassandra services helps.
>>> Please advice.
>>> 
>>> One big table with wide rows. 600 Gb per node.
>>> LZ4Compressor
>>> LeveledCompaction
>>> 
>>> concurrent compactors: 4
>>> compactor throughput: tried from 16 to 128
>>> Concurrent_readers: from 16 to 32
>>> Concurrent_writers: 128
>>> 
>>> 
>>> https://gist.github.com/rskvazh/de916327779b98a437a6
>>> 
>>> 
>>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>>> 19.35
>>> http://code.google.com/p/jvmtop
>>> 
>>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>>> 
>>> 95.73% ( 4.31s) 
>>> google.common.collect.AbstractIterator.tryToComputeN()
>>> 1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
>>> 1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
>>> 0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>> 0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>> 0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
>>> 0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>>> 
>>> 
>>> ttop:
>>> 
>>> 2016-02-12T08:20:25.605+ Process summary
>>> process cpu=1565.15%
>>> application cpu=1314.48% (user=1354.48% sys=-40.00%)
>>> other: cpu=250.67%
>>> heap allocation rate 146mb/s
>>> [000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
>>> [000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
>>> [000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
>>> [000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
>>> [000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
>>> [000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
>>> [000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
>>> [000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
>>> [000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
>>> [000444] 

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Skvazh Roman
There is 1-4 compactions at that moment.
We have many tombstones, which does not removed.
DroppableTombstoneRatio is 5-6 (greater than 1)

> On 12 Feb 2016, at 15:53, Julien Anguenot  wrote:
> 
> Hey, 
> 
> What about compactions count when that is happening?
> 
>   J.
> 
> 
>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
>> 
>> Hello!
>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
>> 1.5 TB 4000 PIOPS EBS drive.
>> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
>> read requests drops of.
>> Only restart of this cassandra services helps.
>> Please advice.
>> 
>> One big table with wide rows. 600 Gb per node.
>> LZ4Compressor
>> LeveledCompaction
>> 
>> concurrent compactors: 4
>> compactor throughput: tried from 16 to 128
>> Concurrent_readers: from 16 to 32
>> Concurrent_writers: 128
>> 
>> 
>> https://gist.github.com/rskvazh/de916327779b98a437a6
>> 
>> 
>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>> 19.35
>> http://code.google.com/p/jvmtop
>> 
>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>> 
>> 95.73% ( 4.31s) 
>> google.common.collect.AbstractIterator.tryToComputeN()
>>  1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
>>  1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
>>  0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>  0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>  0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
>>  0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>> 
>> 
>> ttop:
>> 
>> 2016-02-12T08:20:25.605+ Process summary
>> process cpu=1565.15%
>> application cpu=1314.48% (user=1354.48% sys=-40.00%)
>> other: cpu=250.67%
>> heap allocation rate 146mb/s
>> [000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
>> [000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
>> [000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
>> [000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
>> [000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
>> [000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
>> [000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
>> [000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
>> [000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
>> [000444] user=72.21% sys=-2.30% alloc= 0b/s - SharedPool-Worker-12
>> [000310] user=71.50% sys=-2.31% alloc= 0b/s - SharedPool-Worker-3
>> [000445] user=69.68% sys=-0.83% alloc= 0b/s - SharedPool-Worker-13
>> [000406] user=72.61% sys=-4.40% alloc= 0b/s - SharedPool-Worker-10
>> [000446] user=69.78% sys=-1.65% alloc= 0b/s - SharedPool-Worker-11
>> [000452] user=66.86% sys= 0.22% alloc= 0b/s - SharedPool-Worker-15
>> [000256] user=69.08% sys=-2.42% alloc= 0b/s - SharedPool-Worker-2
>> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
>> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
>> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
>> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
>> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
>> Connection(2673)-127.0.0.1
>> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
>> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
>> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
>> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
>> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
>> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
>> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
>> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
>> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17
> 



Re: ORM layer for cassandra-java?

2016-02-12 Thread DuyHai Doan
"we cannot use it in set (manager.dsl().update().fromBaseTable().)" --> normal and intended, it is forbidden to update a column which
belongs to the primary key.

On Fri, Feb 12, 2016 at 1:50 PM, Atul Saroha 
wrote:

> sorry I was  my understanding issue of solution 2. Thanks for the solution
>
>
> -
> Atul Saroha
> *Sr. Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Fri, Feb 12, 2016 at 3:39 PM, Atul Saroha 
> wrote:
>
>> Thanks for the reply. We would go with solution 1.
>>
>> One more thing, which might be a bug. We are using 4.0.1 version. And
>> query of solution 2  is not possible. c3 is cluster key. No option is
>> visible for this cluster key:
>>
>>1. we cannot use it in set (manager.dsl().update().fromBaseTable().>option for c3>)
>>2. even after where clause (only available option is based on id "
>>:manager.dsl().update().fromBaseTable().c4_Set("1").where().id_Eq(id)")
>>in DSL.
>>
>>
>> @Entity(keyspace = "ks" , table="PrimeUser")
>>
>>> public class PrimeUser {
>>>
>>> @PartitionKey
>>>
 @Column("id")

>>> private int id;
>>>
>>> @Column("c1")
>>> @Static
>>> private String c1;
>>>
>>> @Column("c2")
>>> @Static
>>> private Boolean c2;
>>>
>>>
>>> @Column("c3")
>>> @ClusteringColumn(1)
>>> private String c3;
>>>
>>> @Column("c4")
>>> private String c4;
>>>
>>> }
>>>
>>> Regards,
>>
>>
>>
>> -
>> Atul Saroha
>> *Sr. Software Engineer*
>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>
>> On Fri, Feb 12, 2016 at 3:01 PM, DuyHai Doan 
>> wrote:
>>
>>> "How could I achieve Delta Update through this ORM  where I want to
>>> inserting one row for id,c3,c4 columns only"
>>>
>>> 2 ways:
>>>
>>> 1. Create an empty PrimeUser entity with only id, c3 and c4 values. Use
>>>   manager
>>>.crud()
>>>.insert(entity)
>>>.withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS)
>>>.execute()
>>>
>>> This way, Achilles will only extract the 3 columns (id, c3, c4) and
>>> generates only INSERT INTO primeuser(id,c3,c4) VALUES(...) and skip
>>> inserting null values
>>>
>>>
>>> 2. Use the Update DSL
>>>
>>> manager
>>>.dsl()
>>>.update()
>>>.fromBaseTable()
>>>.c3_Set("")
>>>.c4_Set("")
>>>.where()
>>>.id_Eq(partitionKey)
>>>.execute()
>>>
>>>
>>> "Since Achilies does not have persistence context like hibernate does,
>>> to track what has beed updated in my java entity and update the change only
>>> though DynamicUpdate anotation."
>>>
>>>  This is made on-purpose. Having a proxy to intercept calls to setters
>>> and to track what has been updated would require a "read-before-write" e.g.
>>> load existing data first from Cassandra into the proxy. And
>>> read-before-write is an anti-pattern.
>>>
>>>  Optionally, one could create an empty proxy without read-before-write
>>> and intercept only data that have been added to the entity. And this is
>>> exactly what solution 1. does: create a new empty instance of PrimeUser,
>>> populate some values and user insert() with NOT_NULL_FIELDS insert
>>> strategy. The only difference is that Achilles creates INSERT statement
>>> instead of UPDATE.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 12, 2016 at 9:02 AM, Atul Saroha 
>>> wrote:
>>>
 Thanks Doan,

 We are now evaluating or nearly finalized to use Achilles.

 We are looking for one use case.
 As I mentioned in above for static columns.

> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>   id int,
>   c1 text STATIC,
>   c2 boolean STATIC,
>   c3 text,
>   c4 text,
>   PRIMARY KEY (id, c3)
>
 );
>
 How could I achieve Delta Update through this ORM  where I want to
 inserting one row for id,c3,c4 columns only or updating one row for c4
 column only against ( id,c3). If I use my entity PrimeUser.java and crud
 insert method then it will insert all columns including static.
 I think that there is only one way which is to use update method of
 dsl/Query API. Since Achilies does not have persistence context like
 hibernate does, to track what has beed updated in my java entity and update
 the change only though DynamicUpdate anotation.
 Or there is something I am missing here?.

 Thanks , reply will be highly appreciated

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Julien Anguenot
At the time when the load is high and you have to restart, do you see any 
pending compactions when using `nodetool compactionstats`?

Possible to see a `nodetool compactionstats` taken *when* the load is too high? 
 Have you checked the size of your SSTables for that big table? Any large ones 
in there?  What about the Java HEAP configuration on these nodes?

If you have too many tombstones I would try to decrease gc_grace_seconds so 
they get cleared out earlier during compactions.

   J.

> On Feb 12, 2016, at 8:45 AM, Skvazh Roman  wrote:
> 
> There is 1-4 compactions at that moment.
> We have many tombstones, which does not removed.
> DroppableTombstoneRatio is 5-6 (greater than 1)
> 
>> On 12 Feb 2016, at 15:53, Julien Anguenot  wrote:
>> 
>> Hey, 
>> 
>> What about compactions count when that is happening?
>> 
>>  J.
>> 
>> 
>>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
>>> 
>>> Hello!
>>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
>>> 1.5 TB 4000 PIOPS EBS drive.
>>> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
>>> read requests drops of.
>>> Only restart of this cassandra services helps.
>>> Please advice.
>>> 
>>> One big table with wide rows. 600 Gb per node.
>>> LZ4Compressor
>>> LeveledCompaction
>>> 
>>> concurrent compactors: 4
>>> compactor throughput: tried from 16 to 128
>>> Concurrent_readers: from 16 to 32
>>> Concurrent_writers: 128
>>> 
>>> 
>>> https://gist.github.com/rskvazh/de916327779b98a437a6
>>> 
>>> 
>>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>>> 19.35
>>> http://code.google.com/p/jvmtop
>>> 
>>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>>> 
>>> 95.73% ( 4.31s) 
>>> google.common.collect.AbstractIterator.tryToComputeN()
>>> 1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
>>> 1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
>>> 0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>> 0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>> 0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
>>> 0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>>> 
>>> 
>>> ttop:
>>> 
>>> 2016-02-12T08:20:25.605+ Process summary
>>> process cpu=1565.15%
>>> application cpu=1314.48% (user=1354.48% sys=-40.00%)
>>> other: cpu=250.67%
>>> heap allocation rate 146mb/s
>>> [000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
>>> [000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
>>> [000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
>>> [000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
>>> [000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
>>> [000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
>>> [000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
>>> [000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
>>> [000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
>>> [000444] user=72.21% sys=-2.30% alloc= 0b/s - SharedPool-Worker-12
>>> [000310] user=71.50% sys=-2.31% alloc= 0b/s - SharedPool-Worker-3
>>> [000445] user=69.68% sys=-0.83% alloc= 0b/s - SharedPool-Worker-13
>>> [000406] user=72.61% sys=-4.40% alloc= 0b/s - SharedPool-Worker-10
>>> [000446] user=69.78% sys=-1.65% alloc= 0b/s - SharedPool-Worker-11
>>> [000452] user=66.86% sys= 0.22% alloc= 0b/s - SharedPool-Worker-15
>>> [000256] user=69.08% sys=-2.42% alloc= 0b/s - SharedPool-Worker-2
>>> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
>>> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
>>> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
>>> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
>>> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
>>> Connection(2673)-127.0.0.1
>>> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
>>> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
>>> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
>>> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
>>> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
>>> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
>>> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
>>> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
>>> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17
>> 





Re: ORM layer for cassandra-java?

2016-02-12 Thread Atul Saroha
Thanks Doan,

We are now evaluating or nearly finalized to use Achilles.

We are looking for one use case.
As I mentioned in above for static columns.

> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>   id int,
>   c1 text STATIC,
>   c2 boolean STATIC,
>   c3 text,
>   c4 text,
>   PRIMARY KEY (id, c3)
>
);
>
How could I achieve Delta Update through this ORM  where I want to
inserting one row for id,c3,c4 columns only or updating one row for c4
column only against ( id,c3). If I use my entity PrimeUser.java and crud
insert method then it will insert all columns including static.
I think that there is only one way which is to use update method of
dsl/Query API. Since Achilies does not have persistence context like
hibernate does, to track what has beed updated in my java entity and update
the change only though DynamicUpdate anotation.
Or there is something I am missing here?.

Thanks , reply will be highly appreciated


-
Atul Saroha
*Sr. Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Tue, Feb 9, 2016 at 8:32 PM, DuyHai Doan  wrote:

> Look at Achilles and how it models Partition key & clustering columns:
>
>
> https://github.com/doanduyhai/Achilles/wiki/5-minutes-Tutorial#clustered-entities
>
>
>
> On Tue, Feb 9, 2016 at 12:48 PM, Atul Saroha 
> wrote:
>
>> I know the most popular ORM api
>>
>>1. Kundera :
>>
>> https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera
>>2. Hector (outdated- no longer in development)
>>
>> I am bit confuse to model this table into java domain entity structure
>>
>> CREATE TABLE IF NOT EXISTS  ks.PrimeUser(
>>   id int,
>>   c1 text STATIC,
>>   c2 boolean STATIC,
>>   c3 text,
>>   c4 text,
>>   PRIMARY KEY (id, c3)
>> );
>>
>>
>> One way is to create compound key based on id and c3 column as shown
>> below.
>>
>>> @Entity
>>> @Table(name="PrimeUser", schema="ks")
>>> public class PrimeUser
>>> {
>>>
>>> @EmbeddedId
>>> private CompoundKey key;
>>>
>>>@Column
>>>private String c1;
>>>@Column
>>>private String c2;
>>>@Column
>>>private String c4;
>>> }
>>>
>>> Here key has to be an Embeddable entity:
>>>
>>> @Embeddable
>>> public class CompoundKey
>>> {
>>> @Column private int id;
>>> @Column private String c1;
>>> }
>>>
>>> Then again when we fetch the data based on id only then c1 and c2 will
>> be duplicated multiple times object, even though they are stored per "id"
>> basis. c3 column is cluster key and its corresponding value is mapped to
>> column c4. We avoid using map here as it will cause performance hit.
>> Also he have use cases to fetch the data based on (b1,c3) both also.
>>
>> Is there any other ORM API which handle such scenario.
>>
>>
>>
>>
>> -
>> Atul Saroha
>> *Sr. Software Engineer*
>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>
>
>


Modeling Master Tables in Cassandra

2016-02-12 Thread Harikrishnan A
Hello,I have a scenario where I need to create a customer master table in 
cassandra which has attributes like customerid, name, email, phonenr .etc 
..What is the best way to model such table in cassandra keeping in mind that I 
will be using customer id to populate customer information from other 
application work flows.  While inserting , I need to make sure the customer 
profile doesn't exists in this table by verifying the combination for name + 
email + phonenr.  Unfortunately I can't  store the name, email, phonenr in some 
of the tables where I have association with customer data, instead those table 
stores only customer id.   
Thanks & Regards,Hari

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Skvazh Roman
> Does the load decrease and the node answers requests “normally” when you do 
> disable auto-compaction? You actually see pending compactions on nodes having 
> high load correct?

Nope.

> All seems legit here. Using G1 GC?
Yes

Problems also occurred on nodes without pending compactions.



> On 12 Feb 2016, at 18:44, Julien Anguenot  wrote:
> 
>> 
>> On Feb 12, 2016, at 9:24 AM, Skvazh Roman > > wrote:
>> 
>> I have disabled autocompaction and stop it on highload node.
> 
> Does the load decrease and the node answers requests “normally” when you do 
> disable auto-compaction? You actually see pending compactions on nodes having 
> high load correct?
> 
>> Heap is 8Gb. gc_grace is 86400
>> All sstables is about 200-300 Mb.
> 
> All seems legit here. Using G1 GC?
> 
>> $ nodetool compactionstats
>> pending tasks: 14
> 
> Try to increase the compactors from 4 to 6-8 on a node, disable gossip and 
> let it finish compacting and put it back in the ring by enabling gossip. See 
> what happens.
> 
> The tombstones count growing is because the auto-aucompactions are disabled 
> on these nodes. Probably not your issue.
> 
>J.
> 
> 
>> 
>> $ dstat -lvnr 10
>> ---load-avg--- ---procs--- --memory-usage- ---paging-- -dsk/total- 
>> ---system-- total-cpu-usage -net/total- --io/total-
>> 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  writ| 
>> int   csw |usr sys idl wai hiq siq| recv  send| read  writ
>> 29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0 0 |7507k 7330k|  
>> 13k   40k| 11   1  88   0   0   0|   0 0 |96.5  64.6
>> 29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0 0 |   0   
>> 189k|9822  2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
>> 29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0 0 |   0
>> 26k|8689  2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
>> 29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0 0 |   0
>> 20k|8722  1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C
>> 
>> 
>> JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>> 28.09
>> http://code.google.com/p/jvmtop 
>> 
>> PID 32505: org.apache.cassandra.service.CassandraDaemon
>> ARGS:
>> VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
>> -XX:+CMSCl[...]
>> VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
>> UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: cassandra
>> GC-Time:  0: 8m   #GC-Runs: 6378  #TotalLoadedClasses: 5926
>> CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a
>> 
>>  TID   NAMESTATECPU  TOTALCPU 
>> BLOCKEDBY
>>447 SharedPool-Worker-45 RUNNABLE 60.47% 1.03%
>>343 SharedPool-Worker-2  RUNNABLE 56.46% 3.07%
>>349 SharedPool-Worker-8  RUNNABLE 56.43% 1.61%
>>456 SharedPool-Worker-25 RUNNABLE 55.25% 1.06%
>>483 SharedPool-Worker-40 RUNNABLE 53.06% 1.04%
>>475 SharedPool-Worker-53 RUNNABLE 52.31% 1.03%
>>464 SharedPool-Worker-20 RUNNABLE 52.00% 1.11%
>>577 SharedPool-Worker-71 RUNNABLE 51.73% 1.02%
>>404 SharedPool-Worker-10 RUNNABLE 51.10% 1.29%
>>486 SharedPool-Worker-34 RUNNABLE 51.06% 1.03%
>> Note: Only top 10 threads (according cpu load) are shown!
>> 
>> 
>>> On 12 Feb 2016, at 18:14, Julien Anguenot >> > wrote:
>>> 
>>> At the time when the load is high and you have to restart, do you see any 
>>> pending compactions when using `nodetool compactionstats`?
>>> 
>>> Possible to see a `nodetool compactionstats` taken *when* the load is too 
>>> high?  Have you checked the size of your SSTables for that big table? Any 
>>> large ones in there?  What about the Java HEAP configuration on these nodes?
>>> 
>>> If you have too many tombstones I would try to decrease gc_grace_seconds so 
>>> they get cleared out earlier during compactions.
>>> 
>>>  J.
>>> 
 On Feb 12, 2016, at 8:45 AM, Skvazh Roman > wrote:
 
 There is 1-4 compactions at that moment.
 We have many tombstones, which does not removed.
 DroppableTombstoneRatio is 5-6 (greater than 1)
 
> On 12 Feb 2016, at 15:53, Julien Anguenot  > wrote:
> 
> Hey, 
> 
> What about compactions count when that is happening?
> 
> J.
> 
> 
>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman > > wrote:
>> 
>> Hello!
>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with 
>> attached 1.5 TB 4000 PIOPS EBS drive.
>> Sometimes one or 

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Julien Anguenot
If you positive this is not compaction related I would:

   1. check disk IOPs and latency on the EBS volume. (dstat)
   2. turn GC log on in cassandra-env.sh  and use jstat to see what is 
happening to your HEAP.

I have been asking about compactions initially because if you having one (1) 
big table written by all nodes and fully replicated to all nodes in the cluster 
would definitely trigger constant compactions on every nodes depending on write 
throughput.

   J. 

> On Feb 12, 2016, at 11:03 AM, Skvazh Roman  wrote:
> 
>> Does the load decrease and the node answers requests “normally” when you do 
>> disable auto-compaction? You actually see pending compactions on nodes 
>> having high load correct?
> 
> Nope.
> 
>> All seems legit here. Using G1 GC?
> Yes
> 
> Problems also occurred on nodes without pending compactions.
> 
> 
> 
>> On 12 Feb 2016, at 18:44, Julien Anguenot > > wrote:
>> 
>>> 
>>> On Feb 12, 2016, at 9:24 AM, Skvazh Roman >> > wrote:
>>> 
>>> I have disabled autocompaction and stop it on highload node.
>> 
>> Does the load decrease and the node answers requests “normally” when you do 
>> disable auto-compaction? You actually see pending compactions on nodes 
>> having high load correct?
>> 
>>> Heap is 8Gb. gc_grace is 86400
>>> All sstables is about 200-300 Mb.
>> 
>> All seems legit here. Using G1 GC?
>> 
>>> $ nodetool compactionstats
>>> pending tasks: 14
>> 
>> Try to increase the compactors from 4 to 6-8 on a node, disable gossip and 
>> let it finish compacting and put it back in the ring by enabling gossip. See 
>> what happens.
>> 
>> The tombstones count growing is because the auto-aucompactions are disabled 
>> on these nodes. Probably not your issue.
>> 
>>J.
>> 
>> 
>>> 
>>> $ dstat -lvnr 10
>>> ---load-avg--- ---procs--- --memory-usage- ---paging-- -dsk/total- 
>>> ---system-- total-cpu-usage -net/total- --io/total-
>>> 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  writ| 
>>> int   csw |usr sys idl wai hiq siq| recv  send| read  writ
>>> 29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0 0 |7507k 7330k| 
>>>  13k   40k| 11   1  88   0   0   0|   0 0 |96.5  64.6
>>> 29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0 0 |   0   
>>> 189k|9822  2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
>>> 29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0 0 |   0
>>> 26k|8689  2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
>>> 29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0 0 |   0
>>> 20k|8722  1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C
>>> 
>>> 
>>> JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>>> 28.09
>>> http://code.google.com/p/jvmtop 
>>> 
>>> PID 32505: org.apache.cassandra.service.CassandraDaemon
>>> ARGS:
>>> VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
>>> -XX:+CMSCl[...]
>>> VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
>>> UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: cassandra
>>> GC-Time:  0: 8m   #GC-Runs: 6378  #TotalLoadedClasses: 5926
>>> CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a
>>> 
>>>  TID   NAMESTATECPU  TOTALCPU 
>>> BLOCKEDBY
>>>447 SharedPool-Worker-45 RUNNABLE 60.47% 1.03%
>>>343 SharedPool-Worker-2  RUNNABLE 56.46% 3.07%
>>>349 SharedPool-Worker-8  RUNNABLE 56.43% 1.61%
>>>456 SharedPool-Worker-25 RUNNABLE 55.25% 1.06%
>>>483 SharedPool-Worker-40 RUNNABLE 53.06% 1.04%
>>>475 SharedPool-Worker-53 RUNNABLE 52.31% 1.03%
>>>464 SharedPool-Worker-20 RUNNABLE 52.00% 1.11%
>>>577 SharedPool-Worker-71 RUNNABLE 51.73% 1.02%
>>>404 SharedPool-Worker-10 RUNNABLE 51.10% 1.29%
>>>486 SharedPool-Worker-34 RUNNABLE 51.06% 1.03%
>>> Note: Only top 10 threads (according cpu load) are shown!
>>> 
>>> 
 On 12 Feb 2016, at 18:14, Julien Anguenot > wrote:
 
 At the time when the load is high and you have to restart, do you see any 
 pending compactions when using `nodetool compactionstats`?
 
 Possible to see a `nodetool compactionstats` taken *when* the load is too 
 high?  Have you checked the size of your SSTables for that big table? Any 
 large ones in there?  What about the Java HEAP configuration on these 
 nodes?
 
 If you have too many tombstones I would try to decrease gc_grace_seconds 
 so they get cleared out earlier during compactions.
 
  J.
 
> On Feb 12, 2016, at 8:45 AM, Skvazh Roman 

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Skvazh Roman
I have disabled autocompaction and stop it on highload node.
Freezes all nodes sequentially, 2-6 simultaneously.

Heap is 8Gb. gc_grace is 86400
All sstables is about 200-300 Mb.

$ nodetool compactionstats
pending tasks: 14


$ dstat -lvnr 10
---load-avg--- ---procs--- --memory-usage- ---paging-- -dsk/total- 
---system-- total-cpu-usage -net/total- --io/total-
 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  writ| int 
  csw |usr sys idl wai hiq siq| recv  send| read  writ
29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0 0 |7507k 7330k|  
13k   40k| 11   1  88   0   0   0|   0 0 |96.5  64.6
29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0 0 |   0   189k|9822 
 2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0 0 |   026k|8689 
 2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0 0 |   020k|8722 
 1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C


JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, load avg 28.09
 http://code.google.com/p/jvmtop

 PID 32505: org.apache.cassandra.service.CassandraDaemon
 ARGS:
 VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -XX:+CMSCl[...]
 VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
 UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: cassandra
 GC-Time:  0: 8m   #GC-Runs: 6378  #TotalLoadedClasses: 5926
 CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a

  TID   NAMESTATECPU  TOTALCPU BLOCKEDBY
447 SharedPool-Worker-45 RUNNABLE 60.47% 1.03%
343 SharedPool-Worker-2  RUNNABLE 56.46% 3.07%
349 SharedPool-Worker-8  RUNNABLE 56.43% 1.61%
456 SharedPool-Worker-25 RUNNABLE 55.25% 1.06%
483 SharedPool-Worker-40 RUNNABLE 53.06% 1.04%
475 SharedPool-Worker-53 RUNNABLE 52.31% 1.03%
464 SharedPool-Worker-20 RUNNABLE 52.00% 1.11%
577 SharedPool-Worker-71 RUNNABLE 51.73% 1.02%
404 SharedPool-Worker-10 RUNNABLE 51.10% 1.29%
486 SharedPool-Worker-34 RUNNABLE 51.06% 1.03%
 Note: Only top 10 threads (according cpu load) are shown!


> On 12 Feb 2016, at 18:14, Julien Anguenot  wrote:
> 
> At the time when the load is high and you have to restart, do you see any 
> pending compactions when using `nodetool compactionstats`?
> 
> Possible to see a `nodetool compactionstats` taken *when* the load is too 
> high?  Have you checked the size of your SSTables for that big table? Any 
> large ones in there?  What about the Java HEAP configuration on these nodes?
> 
> If you have too many tombstones I would try to decrease gc_grace_seconds so 
> they get cleared out earlier during compactions.
> 
>   J.
> 
>> On Feb 12, 2016, at 8:45 AM, Skvazh Roman  wrote:
>> 
>> There is 1-4 compactions at that moment.
>> We have many tombstones, which does not removed.
>> DroppableTombstoneRatio is 5-6 (greater than 1)
>> 
>>> On 12 Feb 2016, at 15:53, Julien Anguenot  wrote:
>>> 
>>> Hey, 
>>> 
>>> What about compactions count when that is happening?
>>> 
>>> J.
>>> 
>>> 
 On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
 
 Hello!
 We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
 1.5 TB 4000 PIOPS EBS drive.
 Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 
 - read requests drops of.
 Only restart of this cassandra services helps.
 Please advice.
 
 One big table with wide rows. 600 Gb per node.
 LZ4Compressor
 LeveledCompaction
 
 concurrent compactors: 4
 compactor throughput: tried from 16 to 128
 Concurrent_readers: from 16 to 32
 Concurrent_writers: 128
 
 
 https://gist.github.com/rskvazh/de916327779b98a437a6
 
 
 JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
 19.35
 http://code.google.com/p/jvmtop
 
 Profiling PID 9256: org.apache.cassandra.service.CassandraDa
 
 95.73% ( 4.31s) 
 google.common.collect.AbstractIterator.tryToComputeN()
 1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
 1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
 0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
 0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
 0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
 0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
 
 
 ttop:
 
 2016-02-12T08:20:25.605+ Process summary
 process cpu=1565.15%
 application 

Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Julien Anguenot

> On Feb 12, 2016, at 9:24 AM, Skvazh Roman  wrote:
> 
> I have disabled autocompaction and stop it on highload node.

Does the load decrease and the node answers requests “normally” when you do 
disable auto-compaction? You actually see pending compactions on nodes having 
high load correct?

> Heap is 8Gb. gc_grace is 86400
> All sstables is about 200-300 Mb.

All seems legit here. Using G1 GC?

> $ nodetool compactionstats
> pending tasks: 14

Try to increase the compactors from 4 to 6-8 on a node, disable gossip and let 
it finish compacting and put it back in the ring by enabling gossip. See what 
happens.

The tombstones count growing is because the auto-aucompactions are disabled on 
these nodes. Probably not your issue.

   J.


> 
> $ dstat -lvnr 10
> ---load-avg--- ---procs--- --memory-usage- ---paging-- -dsk/total- 
> ---system-- total-cpu-usage -net/total- --io/total-
> 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  writ| 
> int   csw |usr sys idl wai hiq siq| recv  send| read  writ
> 29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0 0 |7507k 7330k|  
> 13k   40k| 11   1  88   0   0   0|   0 0 |96.5  64.6
> 29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0 0 |   0   
> 189k|9822  2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
> 29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0 0 |   0
> 26k|8689  2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
> 29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0 0 |   0
> 20k|8722  1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C
> 
> 
> JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
> 28.09
> http://code.google.com/p/jvmtop
> 
> PID 32505: org.apache.cassandra.service.CassandraDaemon
> ARGS:
> VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -XX:+CMSCl[...]
> VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
> UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: cassandra
> GC-Time:  0: 8m   #GC-Runs: 6378  #TotalLoadedClasses: 5926
> CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a
> 
>  TID   NAMESTATECPU  TOTALCPU 
> BLOCKEDBY
>447 SharedPool-Worker-45 RUNNABLE 60.47% 1.03%
>343 SharedPool-Worker-2  RUNNABLE 56.46% 3.07%
>349 SharedPool-Worker-8  RUNNABLE 56.43% 1.61%
>456 SharedPool-Worker-25 RUNNABLE 55.25% 1.06%
>483 SharedPool-Worker-40 RUNNABLE 53.06% 1.04%
>475 SharedPool-Worker-53 RUNNABLE 52.31% 1.03%
>464 SharedPool-Worker-20 RUNNABLE 52.00% 1.11%
>577 SharedPool-Worker-71 RUNNABLE 51.73% 1.02%
>404 SharedPool-Worker-10 RUNNABLE 51.10% 1.29%
>486 SharedPool-Worker-34 RUNNABLE 51.06% 1.03%
> Note: Only top 10 threads (according cpu load) are shown!
> 
> 
>> On 12 Feb 2016, at 18:14, Julien Anguenot  wrote:
>> 
>> At the time when the load is high and you have to restart, do you see any 
>> pending compactions when using `nodetool compactionstats`?
>> 
>> Possible to see a `nodetool compactionstats` taken *when* the load is too 
>> high?  Have you checked the size of your SSTables for that big table? Any 
>> large ones in there?  What about the Java HEAP configuration on these nodes?
>> 
>> If you have too many tombstones I would try to decrease gc_grace_seconds so 
>> they get cleared out earlier during compactions.
>> 
>>  J.
>> 
>>> On Feb 12, 2016, at 8:45 AM, Skvazh Roman  wrote:
>>> 
>>> There is 1-4 compactions at that moment.
>>> We have many tombstones, which does not removed.
>>> DroppableTombstoneRatio is 5-6 (greater than 1)
>>> 
 On 12 Feb 2016, at 15:53, Julien Anguenot  wrote:
 
 Hey, 
 
 What about compactions count when that is happening?
 
 J.
 
 
> On Feb 12, 2016, at 3:06 AM, Skvazh Roman  wrote:
> 
> Hello!
> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
> 1.5 TB 4000 PIOPS EBS drive.
> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 
> - read requests drops of.
> Only restart of this cassandra services helps.
> Please advice.
> 
> One big table with wide rows. 600 Gb per node.
> LZ4Compressor
> LeveledCompaction
> 
> concurrent compactors: 4
> compactor throughput: tried from 16 to 128
> Concurrent_readers: from 16 to 32
> Concurrent_writers: 128
> 
> 
> https://gist.github.com/rskvazh/de916327779b98a437a6
> 
> 
> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
> 19.35
> http://code.google.com/p/jvmtop
> 
> Profiling PID 9256: 

Re: Schema Versioning

2016-02-12 Thread John Sanda
If you are interested in a solution that maintains scripts, there are at
least a few projects available,

https://github.com/comeara/pillar - Runs on the JVM and written in Scala.
Scripts are CQL files.
https://github.com/Contrast-Security-OSS/cassandra-migration - Runs on JVM
and I believe a port of Flyway
https://github.com/hsgubert/cassandra_migrations - Ruby based and similar
to ActiveRecord
https://github.com/jsanda/cassalog - A project I have started. Runs on JVM
and scripts are groovy files.

On Thu, Feb 11, 2016 at 4:57 AM, Carlos Alonso  wrote:

> Here we use the Cassanity gem: https://github.com/jnunemaker/cassanity
> This one suggests using schema migration files that are then registered in
> a column family to keep track of the version.
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 10 February 2016 at 21:29, Alex Popescu  wrote:
>
>>
>> On Wed, Feb 10, 2016 at 12:05 PM, Joe Bako  wrote:
>>
>>> Modern RDBMS tools can compare schemas between DDL object definitions
>>> and live databases and generate change scripts accordingly.  Older
>>> techniques included maintaining a version and script table in the database,
>>> storing schema change scripts in a sequential fashion on disk, and
>>> iterating over them to apply them against the target database based on
>>> whether they had been run previously or not (indicated in the script table).
>>
>>
>> Using DevCenter will give you some of these features (and future versions
>> will add more). Just to give you a quick example, if using DevCenter to
>> make schema changes it will offer the options of saving the final
>> definition or just the set of changes applied (to an existing CQL file or a
>> new one).
>>
>>
>> --
>> Bests,
>>
>> Alex Popescu | @al3xandru
>> Sen. Product Manager @ DataStax
>>
>>
>


-- 

- John


Faster version of 'nodetool status'

2016-02-12 Thread Kevin Burton
Is there a faster way to get the output of 'nodetool status' ?

I want us to more aggressively monitor for 'nodetool status' and boxes
being DN...

I was thinking something like jolokia and REST but I'm not sure if there
are variables exported by jolokia for nodetool status.

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Faster version of 'nodetool status'

2016-02-12 Thread Paulo Motta
There was a recent performance inefficiency in nodetool status with virtual
nodes that will be fixed in the next releases (CASSANDRA-7238), so it
should be faster with this fixed.

You can also query StorageServiceMBean.getLiveNodes() via JMX (jolokia or
some other jmx client). For a list of useful management/status methods via
JMX see
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageServiceMBean.java
.



2016-02-12 15:12 GMT-03:00 Kevin Burton :

> Is there a faster way to get the output of 'nodetool status' ?
>
> I want us to more aggressively monitor for 'nodetool status' and boxes
> being DN...
>
> I was thinking something like jolokia and REST but I'm not sure if there
> are variables exported by jolokia for nodetool status.
>
> Thoughts?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Re: Increase compaction performance

2016-02-12 Thread Michał Łowicki
I had to decrease streaming throughput to 10 (from default 200) in order to
avoid effect or rising number of SSTables and number of compaction tasks
while running repair. It's working very slow but it's stable and doesn't
hurt the whole cluster. Will try to adjust configuration gradually to see
if can make it any better. Thanks!

On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki  wrote:

>
>
> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ 
> wrote:
>
>> Also, are you using incremental repairs (not sure about the available
>> options in Spotify Reaper) what command did you run ?
>>
>>
> No.
>
>
>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ :
>>
>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>>>
>>>
>>>
>>> What is your current compaction throughput ?  The current value of
>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>>>
>>
>
> Throughput was initially set to 1024 and I've gradually increased it to
> 2048, 4K and 16K but haven't seen any changes. Tried to change it both from
> `nodetool` and also cassandra.yaml (with restart after changes).
>
>
>>
>>> nodetool getcompactionthroughput
>>>
>>> How to speed up compaction? Increased compaction throughput and
 concurrent compactors but no change. Seems there is plenty idle
 resources but can't force C* to use it.

>>>
>>> You might want to try un-throttle the compaction throughput through:
>>>
>>> nodetool setcompactionsthroughput 0
>>>
>>> Choose a canari node. Monitor compaction pending and disk throughput
>>> (make sure server is ok too - CPU...)
>>>
>>
>
> Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
> sceptical about it.
>
>
>>
>>> Some other information could be useful:
>>>
>>> What is your number of cores per machine and the compaction strategies
>>> for the 'most compacting' tables. What are write/update patterns, any TTL
>>> or tombstones ? Do you use a high number of vnodes ?
>>>
>>
> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
> 256.
>
> Using LCS for all tables. Write / update heavy. No warnings about large
> number of tombstones but we're removing items frequently.
>
>
>
>>
>>> Also what is your repair routine and your values for gc_grace_seconds ?
>>> When was your last repair and do you think your cluster is suffering of a
>>> high entropy ?
>>>
>>
> We're having problem with repair for months (CASSANDRA-9935).
> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
> successfully for long time I guess cluster is suffering of high entropy.
>
>
>>
>>> You can lower the stream throughput to make sure nodes can cope with
>>> what repairs are feeding them.
>>>
>>> nodetool getstreamthroughput
>>> nodetool setstreamthroughput X
>>>
>>
> Yes, this sounds interesting. As we're having problem with repair for
> months it could that lots of things are transferred between nodes.
>
> Thanks!
>
>
>>
>>> C*heers,
>>>
>>> -
>>> Alain Rodriguez
>>> France
>>>
>>> The Last Pickle
>>> http://www.thelastpickle.com
>>>
>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki :
>>>
 Hi,

 Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
 using Cassandra Reaper but nodes after couple of hours are full of pending
 compaction tasks (regular not the ones about validation)

 CPU load is fine, SSD disks below 30% utilization, no long GC pauses.

 How to speed up compaction? Increased compaction throughput and
 concurrent compactors but no change. Seems there is plenty idle
 resources but can't force C* to use it.

 Any clue where there might be a bottleneck?


 --
 BR,
 Michał Łowicki


>>>
>>
>
>
> --
> BR,
> Michał Łowicki
>



-- 
BR,
Michał Łowicki


Re: Security labels

2016-02-12 Thread oleg yusim
Jack,

I updated my document with all the security gaps I was able to find and
posted it there:
https://docs.google.com/document/d/13-yu-1a0MMkBiJFPNkYoTd1Hzed9tgKltWi6hFLZbsk/edit?usp=sharing

Thanks,

Oleg

On Thu, Feb 11, 2016 at 4:09 PM, oleg yusim  wrote:

> Jack,
>
> I asked my management, if I can share with community my assessment
> spreadsheet (whole thing, with gaps and desired configurations). Let's wait
> for their answer. I would definitely update the document I shared with the
> rest of gaps, so you, guys, would have it for sure.
>
> Now, in case if my management would say no:
>
> 1) Here: http://iase.disa.mil/stigs/Pages/a-z.aspx the document titled
> vRealize Operations STIG would be published. As part of it, there would be
> Cassandra STIG (Cassadra is part of vRealize Operations VMware product).
> This STIG would contain only suggestions on right (from the security point
> of view) configuration, where it can be configured.
> 2) Community would have a full list of gaps (things which are needed, but
> can't be configured) after I would update my document
> 3) The rest of the assessment are Not Applicable and Applicable -
> Inherently Meet items, which nobody is interested at.
> 4) Also, when STIG for vRealize Operations would be published, look at the
> VMware site for Security Guidelines for vRealize Operations. They would be
> posted open to public and you would be able to download them free of
> charge. Those would include mitigation, which VMware implemented for some
> of the Cassandra gaps.
>
> Thanks,
>
> Oleg
>
> On Thu, Feb 11, 2016 at 2:55 PM, Jack Krupansky 
> wrote:
>
>> Thanks for putting the items together in a list. This allows people to
>> see things with more context. Give people in the user community a little
>> time to respond. A week, maybe. Hopefully some of the senior Cassandra
>> committers will take a look as well.
>>
>> Will the final assessment become a public document or is it strictly
>> internal for your employer? I know there is a database of these
>> assessments, but I don't know who controls what becomes public and when.
>>
>> -- Jack Krupansky
>>
>> On Thu, Feb 11, 2016 at 3:23 PM, oleg yusim  wrote:
>>
>>> Hi Dani,
>>>
>>> As promised, I sort of put all my questions under the "one roof". I
>>> would really appreciate you opinion on them.
>>>
>>> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen <
>>> dani.trapha...@datastax.com> wrote:
>>>
 ​Hi Oleg,

 Thanks that helped clear things up! This sounds like a daunting task. I
 wish you all the best with it.

 Cheers,
 Dani​

 On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim 
 wrote:

> Dani,
>
> I really appreciate you response. Actually, session timeouts and
> security labels are two different topics (first is about attack when
> somebody opened, say, ssh window to DB, left his machine unattended and
> somebody else stole his session, second - to enable DB to support what
> called MAC access model - stays for mandatory access control. It is widely
> used in the government and military, but not outside of it, we all are 
> used
> to DAC access control model). However, I think you are right and I should
> move all my queries under the one big roof and call this thread 
> "Security".
> I will do this today.
>
> Now, about what you have said, I just answered the same to Jon, in
> Session Timeout thread, but would quickly re-cap here. I understand that
> Cassandra's architecture was aimed and tailored for completely different
> type of scenario. However, unfortunately, that doesn't mean that Cassandra
> is not vulnerable to the same very set of attacks relational database 
> would
> be vulnerable to. It just means Cassandra is not protected against those
> attacks, because protection against them was not thought of, when database
> was created. I already gave the AAA and session's timeout example in Jon's
> thread, and those are just one of many.
>
> Now what I'm trying to do, I'm trying to create a STIG - security
> federal compliance document, which will assess Cassandra against SRG
> concepts (security federal compliance recommendations for databases
> overall) and will highlight what is not met, and can't be in current 
> design
> (i.e. what system architects should keep in mind and what they need to
> compensate for with other controls on different layers of system model) 
> and
>  what can be met either with configuration or with little enhancement (and
> how).
>
> That document would be of great help for Cassandra as a product
> because it would allow it to be marketed as a product with existing
> security assessment and 

Re: Security assessment of Cassandra

2016-02-12 Thread oleg yusim
Greetings,

Following Jack's and Matt's suggestions, I moved the doc to Google Docs and
added to it all the security gaps in Cassandra I was able to discover
(please, see second table below fist).

Here is an updated link to my document:

https://docs.google.com/document/d/13-yu-1a0MMkBiJFPNkYoTd1Hzed9tgKltWi6hFLZbsk/edit?usp=sharing

Thanks,

Oleg

On Thu, Feb 11, 2016 at 2:29 PM, oleg yusim  wrote:

> Greetings,
>
> Performing security assessment of Cassandra with the goal of generating
> STIG for Cassandra (iase.disa.mil/stigs/Pages/a-z.aspx) I ran across some
> questions regarding the way certain security features are implemented (or
> not) in Cassandra.
>
> I composed the list of questions on these topics, which I wasn't able to
> find definitive answer to anywhere else and posted it here:
>
> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>
> It is shared with all the members of that list, and any of the members of
> this list is welcome to comment on this document (there is a place for
> community comments specially reserved near each of the questions and my
> take on it).
>
> I would greatly appreciate Cassandra community help here.
>
> Thanks,
>
> Oleg
>


Re: Session timeout

2016-02-12 Thread oleg yusim
Jack,

I updated my document with all the security gaps I was able to discover
(see the second table, below the fist one). I also moved the document to
Google Docs from Word doc, shared on Google Drive, following Matt's
suggestion.

Please, see the updated link:
https://docs.google.com/document/d/13-yu-1a0MMkBiJFPNkYoTd1Hzed9tgKltWi6hFLZbsk/edit?usp=sharing

Thanks,

Oleg

On Thu, Feb 11, 2016 at 3:52 PM, oleg yusim  wrote:

> Jack,
>
> This document doesn't cover all the areas where user will need to get
> engaged in explicit mitigation, it only covers those, I wasn't sure about.
> But - you are making a good point here. Let me update the document with the
> rest of the gaps, so community would have a complete list here.
>
> Thanks,
>
> Oleg
>
> On Thu, Feb 11, 2016 at 3:38 PM, Jack Krupansky 
> wrote:
>
>> Thanks! A useful contribution, no matter what the outcome. I trust your
>> ability to read of the doc, so I don't expect a lot of change to the
>> responses, but we'll see. At a minimum, it will probably be good to have
>> doc to highlight areas where users will need to engage in explicit
>> mitigation efforts if their infrastructure does not implicitly effect
>> mitigation for various security exposures.
>>
>> -- Jack Krupansky
>>
>> On Thu, Feb 11, 2016 at 3:21 PM, oleg yusim  wrote:
>>
>>> Robert, Jack, Bryan,
>>>
>>> As you suggested, I put together document, titled
>>> Cassandra_Security_Topics_to_Discuss, put it on Google Drive and shared it
>>> with everybody on this list. The document contains list of questions I have
>>> on Cassandra, my take on it, and has a place for notes Community would like
>>> to make on it.
>>>
>>> Please, review. Any help would be appreciated greatly.
>>>
>>> https://drive.google.com/open?id=0B2L9nW4Cyj41YWd1UkI4ZXVPYmM
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 6:30 PM, Bryan Cheng 
>>> wrote:
>>>
 To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a
 well-funded and fairly large company. You are certainly free to continue
 using the list and asking for community support (I am definitely not in any
 position to tell you otherwise, anyway), but that community support is by
 definition ad-hoc and best effort. Furthermore, your questions range from
 trivial to, as Jonathan as mentioned earlier, concepts that many of us have
 no reason to consider at this time (perhaps your work will convince us
 otherwise- but you'll need to finish it first ;) )

 What I'm getting at here is that perhaps, if you need faster, deeper
 level, and more elaborate support than this list can provide, you should
 look into the services of a paid Cassandra support company like Datastax.

 On Fri, Jan 29, 2016 at 3:34 PM, Robert Coli 
 wrote:

> On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky <
> jack.krupan...@gmail.com> wrote:
>
>> One last time, I'll simply renew my objection to the way you are
>> abusing this list.
>>
>
> FWIW, while I appreciate that OP (Oleg) is attempting to do a service
> for the community, I agree that the flood of single topic, context-lacking
> posts regarding deep internals of Cassandra is likely to inspire the
> opposite of a helpful response.
>
> This is important work, however, so hopefully we can collectively find
> a way through the meta and can discuss this topic without acrimony! :D
>
> =Rob
>
>


>>>
>>
>


Sudden disk usage

2016-02-12 Thread Branton Davis
One of our clusters had a strange thing happen tonight.  It's a 3 node
cluster, running 2.1.10.  The primary keyspace has RF 3, vnodes with 256
tokens.

This evening, over the course of about 6 hours, disk usage increased from
around 700GB to around 900GB on only one node.  I was at a loss as to what
was happening and, on a whim, decided to run nodetool cleanup on the
instance.  I had no reason to believe that it was necessary, as no nodes
were added or tokens moved (not intentionally, anyhow).  But it immediately
cleared up that extra space.

I'm pretty lost as to what would have happened here.  Any ideas where to
look?

Thanks!


Re: Cassandra eats all cpu cores, high load average

2016-02-12 Thread Jack Krupansky
Wide rows? How wide? How many rows per partition, typically and at the
extreme? how many clustering columns?

When you restart the node does it revert to completely normal response?

Which release of Cassandra?

Does every node eventually hit this problem?

After a restart, how long before the problem recurs for that node?

-- Jack Krupansky

On Fri, Feb 12, 2016 at 4:06 AM, Skvazh Roman  wrote:

> Hello!
> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached
> 1.5 TB 4000 PIOPS EBS drive.
> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30
> - read requests drops of.
> Only restart of this cassandra services helps.
> Please advice.
>
> One big table with wide rows. 600 Gb per node.
> LZ4Compressor
> LeveledCompaction
>
> concurrent compactors: 4
> compactor throughput: tried from 16 to 128
> Concurrent_readers: from 16 to 32
> Concurrent_writers: 128
>
>
> https://gist.github.com/rskvazh/de916327779b98a437a6
>
>
>  JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg
> 19.35
>  http://code.google.com/p/jvmtop
>
>  Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>
>   95.73% ( 4.31s)
> google.common.collect.AbstractIterator.tryToComputeN()
>1.39% ( 0.06s) com.google.common.base.Objects.hashCode()
>1.26% ( 0.06s) io.netty.channel.epoll.Native.epollWait()
>0.85% ( 0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>0.46% ( 0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>0.26% ( 0.01s) com.google.common.collect.Iterators$7.computeNext()
>0.06% ( 0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>
>
> ttop:
>
> 2016-02-12T08:20:25.605+ Process summary
>   process cpu=1565.15%
>   application cpu=1314.48% (user=1354.48% sys=-40.00%)
>   other: cpu=250.67%
>   heap allocation rate 146mb/s
> [000405] user=76.25% sys=-0.54% alloc= 0b/s - SharedPool-Worker-9
> [000457] user=75.54% sys=-1.26% alloc= 0b/s - SharedPool-Worker-14
> [000451] user=73.52% sys= 0.29% alloc= 0b/s - SharedPool-Worker-16
> [000311] user=76.45% sys=-2.99% alloc= 0b/s - SharedPool-Worker-4
> [000389] user=70.69% sys= 2.62% alloc= 0b/s - SharedPool-Worker-6
> [000388] user=86.95% sys=-14.28% alloc= 0b/s - SharedPool-Worker-5
> [000404] user=70.69% sys= 0.10% alloc= 0b/s - SharedPool-Worker-8
> [000390] user=72.61% sys=-1.82% alloc= 0b/s - SharedPool-Worker-7
> [000255] user=87.86% sys=-17.87% alloc= 0b/s - SharedPool-Worker-1
> [000444] user=72.21% sys=-2.30% alloc= 0b/s - SharedPool-Worker-12
> [000310] user=71.50% sys=-2.31% alloc= 0b/s - SharedPool-Worker-3
> [000445] user=69.68% sys=-0.83% alloc= 0b/s - SharedPool-Worker-13
> [000406] user=72.61% sys=-4.40% alloc= 0b/s - SharedPool-Worker-10
> [000446] user=69.78% sys=-1.65% alloc= 0b/s - SharedPool-Worker-11
> [000452] user=66.86% sys= 0.22% alloc= 0b/s - SharedPool-Worker-15
> [000256] user=69.08% sys=-2.42% alloc= 0b/s - SharedPool-Worker-2
> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP
> Connection(2673)-127.0.0.1
> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17


Re: installing DSE

2016-02-12 Thread Bhuvan Rawal
I believe you missed this note :

   1. Attention: Depending on your environment, you might need to replace @ in
   your email address with %40 and escape any character in your password
   that is used in your operating system's command line. Examples: \! and \|
   .


On Sat, Feb 13, 2016 at 3:15 AM, Ted Yu  wrote:

> Hi,
> I followed this guide:
>
> https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/install/installRHELdse.html
>
> and populated /etc/yum.repos.d/datastax.repo with DataStax Academy account
> info.
>
> [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'gmail.com:p
> as...@rpm.datastax.com'"
> Trying other mirror.
>
> Can someone give me hint ?
>
> Thanks
>


Re: installing DSE

2016-02-12 Thread Ted Yu
I have one seed node and one non-seed node managed by opscenter.

I previously ran Cassandra daemons on 3 other non-seed nodes.

How do I let opscenter discover these 3 other nodes ?

Thanks

On Fri, Feb 12, 2016 at 2:17 PM, Ted Yu  wrote:

> When I retried node addition from opscenter UI, I got this:
>
>  Installation stage failed: The following packages are already installed:
> dse-libsqoop, dse-full, dse-libmahout, dse-liblog4j, dse-libspark,
> dse-libsolr, dse-libtomcat, dse- demos, dse-libcassandra; The following
> packages are already installed: dse-libsqoop, dse-full, dse-libmahout,
> dse-liblog4j, dse-libspark, dse-libsolr, dse-libtomcat, dse-demos,
>  dse-libcassandra; The following packages are already installed:
> dse-libsqoop, dse-full, dse-libmahout, dse-liblog4j, dse-libspark,
> dse-libsolr, dse-libtomcat, dse-demos, dse-libcassandra
>
> How can I resume ?
>
> On Fri, Feb 12, 2016 at 2:01 PM, Bhuvan Rawal  wrote:
>
>> I believe you missed this note :
>>
>>1. Attention: Depending on your environment, you might need to
>>replace @ in your email address with %40 and escape any character in
>>your password that is used in your operating system's command line.
>>Examples: \! and \|.
>>
>>
>> On Sat, Feb 13, 2016 at 3:15 AM, Ted Yu  wrote:
>>
>>> Hi,
>>> I followed this guide:
>>>
>>> https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/install/installRHELdse.html
>>>
>>> and populated /etc/yum.repos.d/datastax.repo with DataStax Academy
>>> account info.
>>>
>>> [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'gmail.com:p
>>> as...@rpm.datastax.com'"
>>> Trying other mirror.
>>>
>>> Can someone give me hint ?
>>>
>>> Thanks
>>>
>>
>>
>


Can't bootstrap a node

2016-02-12 Thread Brian Picciano
I posted this on the IRC but wasn't able to receive any help. I have two
nodes running 3.0.3. They're in different datacenters, connected by
openvpn. When I go to bootstrap the new node it handshakes fine, but always
gets this error while transferring data:

http://gobin.io/oMll

If I follow the log's advice and run "nodetool bootstrap resume" I get the
following:

http://gobin.io/kkSu

I'm fairly confident this is not a connection issue, ping is sub-50ms, and
there isn't any packet loss that I can see. Any help would be greatly
appreciated, I'd also be happy to give any further debugging info that
might help. Thanks!


installing DSE

2016-02-12 Thread Ted Yu
Hi,
I followed this guide:
https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/install/installRHELdse.html

and populated /etc/yum.repos.d/datastax.repo with DataStax Academy account
info.

[Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'gmail.com:p
as...@rpm.datastax.com'"
Trying other mirror.

Can someone give me hint ?

Thanks


Re: installing DSE

2016-02-12 Thread Ted Yu
When I retried node addition from opscenter UI, I got this:

 Installation stage failed: The following packages are already installed:
dse-libsqoop, dse-full, dse-libmahout, dse-liblog4j, dse-libspark,
dse-libsolr, dse-libtomcat, dse- demos, dse-libcassandra; The following
packages are already installed: dse-libsqoop, dse-full, dse-libmahout,
dse-liblog4j, dse-libspark, dse-libsolr, dse-libtomcat, dse-demos,
 dse-libcassandra; The following packages are already installed:
dse-libsqoop, dse-full, dse-libmahout, dse-liblog4j, dse-libspark,
dse-libsolr, dse-libtomcat, dse-demos, dse-libcassandra

How can I resume ?

On Fri, Feb 12, 2016 at 2:01 PM, Bhuvan Rawal  wrote:

> I believe you missed this note :
>
>1. Attention: Depending on your environment, you might need to replace
>@ in your email address with %40 and escape any character in your
>password that is used in your operating system's command line. Examples:
>\! and \|.
>
>
> On Sat, Feb 13, 2016 at 3:15 AM, Ted Yu  wrote:
>
>> Hi,
>> I followed this guide:
>>
>> https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/install/installRHELdse.html
>>
>> and populated /etc/yum.repos.d/datastax.repo with DataStax Academy
>> account info.
>>
>> [Errno 14] PYCURL ERROR 6 - "Couldn't resolve host 'gmail.com:p
>> as...@rpm.datastax.com'"
>> Trying other mirror.
>>
>> Can someone give me hint ?
>>
>> Thanks
>>
>
>


Re: Modeling Master Tables in Cassandra

2016-02-12 Thread Harikrishnan A
Thanks Carlos...This certainly helps..
Sent from Yahoo Mail on Android 
 
  On Fri, Feb 12, 2016 at 2:02 AM, Carlos Alonso wrote:   
Hi Hari.
I'd suggest having a customers table like this:
CREATE TABLE customers (  customerid UUID,  name VARCHAR,  email VARCHAR,  
phonenr VARCHAR,  PRIMARY KEY(name, email, phonenr)).
This way your inserts could be INSERT INTO customers (customerid, ...) VALUES 
(...) IF NOT EXISTS;Afterwards, you can use your customerid in the dependent 
tables such as:
CREATE TABLE customeraction (  customerid UUID,  action VARCHAR,  time 
TIMESTAMP,  PRIMARY KEY(customerid, action, time)  // Keys definition will, of 
course, depend on the access pattern.)
Before wrapping up I'd like to suggest denormalising a little bit using statics 
if possible.
In case you need to JOIN your customers with any of your dependent tables, that 
will have to be done in application logic as Cassandra doesn't support such 
feature. Instead you can denormalise using statics which will actually almost 
not duplicate any data as the static is saved only once per partition.
An example:
CREATE TABLE customeraction (  customerid UUID,  name VARCHAR STATIC,  email 
VARCHAR STATIC,  phonenr VARCHAR STATIC,  action VARCHAR,  time TIMESTAMP,  
PRIMARY KEY(customerid, action, time)).
This way, you avoid client side joins.
Hope this helps!
Carlos Alonso | Software Engineer | @calonso

On 12 February 2016 at 09:25, Harikrishnan A  wrote:

Hello,I have a scenario where I need to create a customer master table in 
cassandra which has attributes like customerid, name, email, phonenr .etc 
..What is the best way to model such table in cassandra keeping in mind that I 
will be using customer id to populate customer information from other 
application work flows.  While inserting , I need to make sure the customer 
profile doesn't exists in this table by verifying the combination for name + 
email + phonenr.  Unfortunately I can't  store the name, email, phonenr in some 
of the tables where I have association with customer data, instead those table 
stores only customer id.   
Thanks & Regards,Hari