Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-11 Thread Richard L. Burton III
What I'm noticing with these projects is that they don't handle CQL files
properly. e.g., cassandra-unit dies when you have a string that contains ;
inside of it. The parsing logic they use is very primitive in the sense
they simple look for ; to denote the end of a statement.

Is there any class in Cassandra I could use that given a *.cql file, it'll
return a list of statements inside of it?

Looking at CQLParser, it's only good for parsing a single statement vs. a
file that contains multiple statements.


On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan  wrote:

> Achilles 4.x does offer an embedded Cassandra server support with some
> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>
> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
> Le 11 janv. 2016 20:47, "Richard L. Burton III"  a
> écrit :
>
>> I'm looking to see what's recommended for an embedded version of
>> Cassandra, just for unit testing.
>>
>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>> wanted to see if there's was a better recommendation?
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>


-- 
-Richard L. Burton III
@rburton


Re: sstableloader throughput

2016-01-11 Thread Noorul Islam Kamal Malmiyoda
On Mon, Jan 11, 2016 at 10:25 PM, Jeff Jirsa  wrote:
>
> Make sure streaming throughput isn’t throttled on the destination cluster.
>


How do I do that? Is stream_throughput_outbound_megabits_per_sec the
attribute in cassandra.yaml.

I think we can set that on the fly using nodetool setstreamthroughput

I ran

nodetool setstreamthroughput 0

on target machine. But that doesn't improve the average througput.

Thanks and Regards
Noorul

> Stream from more machines (divide sstables between a bunch of machines, run 
> in parallel).
>
>
>
>
>
>
>
> On 1/11/16, 5:21 AM, "Noorul Islam K M"  wrote:
>
>>
>>I have a need to stream data to new cluster using sstableloader. I
>>spawned a machine with 32 cores assuming that sstableloader scaled with
>>respect to cores. But it doesn't look like so.
>>
>>I am getting an average throughput of 18 MB/s which seems to be pretty
>>low (I might be wrong).
>>
>>Is there any way to increase the throughput. OpsCenter data on target
>>cluster shows very less write requests / second.
>>
>>Thanks and Regards
>>Noorul


Re: Modeling contact list, plain table or List

2016-01-11 Thread Carlos Alonso
I have never used Materialized Views so maybe this suggestion is not
possible, but in this case, wouldn't it make sense to define the
materialized view as

is_favourite IS TRUE
instead of
is_favourite IS NOT NULL?

Carlos Alonso | Software Engineer | @calonso 

On 10 January 2016 at 09:59, DuyHai Doan  wrote:

> Try this
>
> CREATE TABLE communication.user_contact_list (
>   user_id uuid,
>   contact_id uuid,
>   contact_name text,
>   created_at timeuuid,
>   is_favorite boolean,
>   favorite_at timestamp,
>   PRIMARY KEY (user_id, contact_name, contact_id)
> );
>
> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
> AS SELECT * FROM communication.user_contact_list
> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>
> If the flag is_favorite is not updated very often the write perf hit due
> to materialized view is acceptable.
>
> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P.  wrote:
>
>> Jack/ Michael,
>>
>> Thanks for answering.
>>
>> How big?: Less then one hundred contacts by user is the normal.
>>
>> Update requirements: The UPDATE requirements are all around  each user
>> “favoriting/unfavoriting” the contacts . Deleting is not very frequent.
>>
>> Does that mean that in C* 3.02 , for this use case to work, the contact
>> name  must be part of a  composite partition key in order to allow sorting
>> by contact_name like this ? :
>>
>> CREATE TABLE communication.user_contact_list (
>> user_id uuid,
>> contact_name text,
>> is_favorite boolean,
>> contact_id uuid,
>> created_at timeuuid,
>> favorite_at timestamp,
>> PRIMARY KEY ((user_id, contact_name), is_favorite)
>> )  WITH CLUSTERING ORDER BY (contact_name ASC);
>>
>> Query: Select * from user_contact_list where user_id = :userid and
>> is_favorite = true order by contact_name asc;
>>
>> Looks like each contact as a row/clustering key will be the way to go.
>>
>> Thanks
>>
>> IPVP
>>
>>
>> From: Laing, Michael 
>> 
>> Reply: user@cassandra.apache.org >
>> 
>> Date: January 9, 2016 at 11:51:27 AM
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> Note that in C* 3.02 the second query is invalid:
>>
>> cqlsh> Select * from communication.user_contact_list where user_id =
>> 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order
>> by contact_name asc;
>>
>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
>> "is_favorite" cannot be restricted as preceding column "contact_name" is
>> not restricted"*
>>
>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky 
>> wrote:
>>
>>> How big is each contact list expected to be? Dozens? Hundreds?
>>> Thousands? If just dozens, a simple list column would seem sufficient. If
>>> thousands, the row (not partition) would get kind of bloated.
>>>
>>> What requirements do you have for updating? If updating contacts and
>>> lots of contacts, I think I'd prefer each contact as a row/clustering key.
>>> Nice to be able to do selective queries to return slices of the clustering
>>> key values, which is not so easy if they are all just a single list column.
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 8, 2016 at 6:31 PM, Isaac P.  wrote:
>>>
 Hi everyone

 What would perform better while modeling a simple user contact list
  that will be used mainly to select the recipients for/from/to messages ?

 a) Individual rows to each (user, contact) pair so a select would fetch
 all the rows  to retrieve all the contacts from a given user.

 or

 b) A single row for each user containing the List  UDT.

 Aside of the basic CRUD, the queries will be the following ones:

 Select * from user_contact_list where user_id = :userid order by
 contact_name asc

 Select * from user_contact_list where user_id = :userid and is_favorite
 = true order by contact_name asc

 After reading this
 https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
  the table is looking like this:

 CREATE TABLE communication.user_contact_list (
 user_id uuid,
 contact_id uuid,
 contact_name text,
 created_at timeuuid,
 is_favorite boolean,
 favorite_at timestamp,
 PRIMARY KEY (user_id, contact_name, is_favorite)
 );

 Any guidance will be appreciated.

 Thanks

 --
 IPVP


>>>
>>
>


Re: Too many compactions, maybe keyspace system?

2016-01-11 Thread Robert Coli
The lines you are looking for look like this :

INFO [CompactionExecutor:48] 2016-01-12 09:07:59,995 CompactionTask.java
(line 120) Compacting
[SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4959-Data.db'),
SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4960-Data.db'),
SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4961-Data.db'),
SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4962-Data.db')]

and

INFO [CompactionExecutor:47] 2016-01-12 04:06:43,946 CompactionTask.java
(line 299) Compacted 4 sstables to
[/usr/local/cassandra/data/system/compaction_history/system-compaction_history-jb-3532,].
 14,413 bytes to 13,334 (~92% of original) in 52,135ms = 0.000244MB/s.  156
total partitions merged to 149.  Partition merge counts were {1:156, }

There are only 4 of the compaction completed messages in the log you
attached, tiny ones.

The log doesn't look like much compaction is occurring?

=Rob


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
What is the alternative if my cassandra version is prior to 3.0
(specifically) 2.1) and which is already in production.?

Also as per the docs given at

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
 what does it mean by we need to do capacity planning if we need to search
using SOLR. What is other alternative when we do not know the size of the
data ?

 Thanks,

Anuja



On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:

>
> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>
>> My question is, what is the alternative if we need to order by col3 or
>> col4 in my above example without including col2 in order by clause.
>>
>
> The server-side alternative is to create a second table (or a materialized
> view, if you're using 3.0+) that uses a different clustering order.
> Cassandra purposefully only supports simple and efficient queries that can
> be handled quickly (with a few exceptions), and arbitrary ordering is not
> part of that, especially if you consider complications like paging.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Too many compactions, maybe keyspace system?

2016-01-11 Thread Shuo Chen
The attachment is result of grep -i GC /usr/local/cassandra/system.log >
gc.log

On Tue, Jan 12, 2016 at 1:12 PM, Shuo Chen  wrote:

> I have a assumption that, lots of pending compaction tasks jam the memory
> and raise full gc. The full chokes the process and slows down compaction.
> And this causes more pending compaction tasks and more pressure on memory.
>
> Is there a method to list the concrete details of pending compaction tasks?
>
> On Tue, Jan 12, 2016 at 11:41 AM, Robert Coli 
> wrote:
>
>> The lines you are looking for look like this :
>>
>> INFO [CompactionExecutor:48] 2016-01-12 09:07:59,995 CompactionTask.java
>> (line 120) Compacting
>> [SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4959-Data.db'),
>> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4960-Data.db'),
>> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4961-Data.db'),
>> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4962-Data.db')]
>>
>> and
>>
>> INFO [CompactionExecutor:47] 2016-01-12 04:06:43,946 CompactionTask.java
>> (line 299) Compacted 4 sstables to
>> [/usr/local/cassandra/data/system/compaction_history/system-compaction_history-jb-3532,].
>>  14,413 bytes to 13,334 (~92% of original) in 52,135ms = 0.000244MB/s.  156
>> total partitions merged to 149.  Partition merge counts were {1:156, }
>>
>> There are only 4 of the compaction completed messages in the log you
>> attached, tiny ones.
>>
>> The log doesn't look like much compaction is occurring?
>>
>> =Rob
>>
>>
>
>
> --
> *陈硕* *Shuo Chen*
> chenatu2...@gmail.com
> chens...@whaty.com
>



-- 
*陈硕* *Shuo Chen*
chenatu2...@gmail.com
chens...@whaty.com


gc.log
Description: Binary data


Upgrade from 2.0.x to 2.2.x documentation missing

2016-01-11 Thread Amit Singh F
Hi,

We are currently at Cassandra 2.0.14 in production and since it going to be EOL 
soon so we are planning to upgrade it to Cassandra 2.2.4 
(http://cassandra.apache.org/download/) which is the currently production ready 
version. While doing some analysis we found that there is no such entry of 2.2 
branch in datastax documentation 
(http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeC_c.html) 
which guides on how to reach 2.2.x from 2.0.x .

Can somebody guide us on the Upgrade path which needs to be followed while 
upgrading from 2.0.x to 2.2.x  .
Quick response will be highly appreciated. Thanks in advance


Regards
Amit Singh


Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-11 Thread Hiroyuki Yamada
Can anyone answer my questions ?
I think the current datastax documents including python's one don't
describe how we should set consistency with lightweight transactions
precisely.

Regards,
Hiro

On Fri, Jan 8, 2016 at 11:48 AM, Hiroyuki Yamada  wrote:

> Thanks Tyler.
>
> I've read the python document and it's a bit more clear than before,
> but i'm still confused at what combinations make lightweight transaction
> operations work correctly.
>
> So, let me clarify the conditions where lightweight transactions work.
>
> QUORUM conditional write -> QUORUM read => OK (meets linearizability)
> ANY conditional write -> SERIAL read =>  OK (meets linearizability)
> ONE conditional write -> SERIAL read => OK ?
> SERIAL conditional write -> ??? read => ERROR for some reasons (why?)
>
> One question is that my understanding about the top 2 conditions are
> correct ?
> And the other question is "ONE conditional write - SERIAL read" is ok ?
> Also, why SERIAL conditional write fails
> even though SERIAL conditional write with (for example) ANY read
> afterwards seems logically OK ?
>
> The following document says that it seems like we can specify SERIAL in
> writes,
> so, when should I use SERIAL in writes except conditional writes (, which
> fails) ?
> <
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
> >
>
>
> Thanks,
> Hiro
>
>
>
> On Fri, Jan 8, 2016 at 2:44 AM, Tyler Hobbs  wrote:
>
>> The python driver docs explain this pretty well, I think:
>> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>>
>> On Thu, Jan 7, 2016 at 3:44 AM, Hiroyuki Yamada 
>> wrote:
>>
>>> Hi,
>>>
>>> I've been doing some POCs of lightweight transactions and
>>> I come up with some questions, so please let me ask them to you here.
>>>
>>> So the question is:
>>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>>> statements ?
>>>
>>> I used the statements with ONE and QUORUM first, then it seems fine.
>>> But, when I set SERIAL, it gave me the following error.
>>>
>>> === error message ===
>>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>>> SERIAL is not supported as conditional update commit consistency. Use ANY
>>> if you mean "make sure it is accepted but I don't care how many replicas
>>> commit it for non-SERIAL reads"
>>> === error message ===
>>>
>>>
>>> So, I'm wondering what's SERIAL for when writing (and reading) and
>>> what the differences are in setting ONE, QUORUM and ANY when using IF
>>> NOT EXIST or UPDATE IF statements.
>>>
>>> Could you give me some advises ?
>>>
>>> Thanks,
>>> Hiro
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


Re: Too many compactions, maybe keyspace system?

2016-01-11 Thread Shuo Chen
I have a assumption that, lots of pending compaction tasks jam the memory
and raise full gc. The full chokes the process and slows down compaction.
And this causes more pending compaction tasks and more pressure on memory.

Is there a method to list the concrete details of pending compaction tasks?

On Tue, Jan 12, 2016 at 11:41 AM, Robert Coli  wrote:

> The lines you are looking for look like this :
>
> INFO [CompactionExecutor:48] 2016-01-12 09:07:59,995 CompactionTask.java
> (line 120) Compacting
> [SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4959-Data.db'),
> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4960-Data.db'),
> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4961-Data.db'),
> SSTableReader(path='/usr/local/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4962-Data.db')]
>
> and
>
> INFO [CompactionExecutor:47] 2016-01-12 04:06:43,946 CompactionTask.java
> (line 299) Compacted 4 sstables to
> [/usr/local/cassandra/data/system/compaction_history/system-compaction_history-jb-3532,].
>  14,413 bytes to 13,334 (~92% of original) in 52,135ms = 0.000244MB/s.  156
> total partitions merged to 149.  Partition merge counts were {1:156, }
>
> There are only 4 of the compaction completed messages in the log you
> attached, tiny ones.
>
> The log doesn't look like much compaction is occurring?
>
> =Rob
>
>


-- 
*陈硕* *Shuo Chen*
chenatu2...@gmail.com
chens...@whaty.com


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
1 more question, what does it mean by "cassandra inherently sorts data"?
For eg:
I have a table with schema

CREATE TABLE users (

...   user_name varchar PRIMARY KEY,

...   password varchar,

...   gender varchar,

...   session_token varchar,

...   state varchar,

...   birth_year bigint

... );

I inserted data in random order but I on firing select statement I get data
sorted by birth_year..  So why does this happen?

 cqlsh:learning> select * from users;



user_name | birth_year | gender | password | session_token | state

---+++--+---+-

  John |   1979 |  M | qwer |   abc |  JK

   Dharini |   1980 |  F |  Xyz |   abc | Gujarat

 Keval |   1990 |  M |  DDD |   abc |  WB

On Tue, Jan 12, 2016 at 12:52 PM, anuja jain  wrote:

> What is the alternative if my cassandra version is prior to 3.0
> (specifically) 2.1) and which is already in production.?
>
> Also as per the docs given at
>
>
> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>  what does it mean by we need to do capacity planning if we need to
> search using SOLR. What is other alternative when we do not know the size
> of the data ?
>
>  Thanks,
>
> Anuja
>
>
>
> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:
>
>>
>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>>
>>> My question is, what is the alternative if we need to order by col3 or
>>> col4 in my above example without including col2 in order by clause.
>>>
>>
>> The server-side alternative is to create a second table (or a
>> materialized view, if you're using 3.0+) that uses a different clustering
>> order.  Cassandra purposefully only supports simple and efficient queries
>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>> is not part of that, especially if you consider complications like paging.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


Re: Modeling contact list, plain table or List

2016-01-11 Thread Jack Krupansky
You specify a userid and contactname to delete a single contact row in the
base table. The MV is then updated accordingly.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 4:20 PM, I PVP  wrote:

> Well…the way it is now  it is not possible to delete a specific contact
> row from the base table at all. Because a DELETE statement only works with
>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
> clausule.
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>
> The way it is now  It is only possible to delete the entire contact list
>  for that specific user.
> Looks like will need to:
> 1)SELECT all rows from user_contact excluding the one  that the user wants
> to get rid of.
> 2) DELETE all the user_contact rows  for that particular user .
> 3) INSERT  the result of 1).
>
> Is that the proper way to achieve it or Am I missing some point in the
> modeling that would allow to delete a specific contact row  and still able
> to comply with the select requirements?
>
> Thanks
> --
> IPVP
>
>
> From: Jack Krupansky  
> Reply: user@cassandra.apache.org >
> 
> Date: January 11, 2016 at 7:00:04 PM
>
> To: user@cassandra.apache.org >
> 
> Subject:  Re: Modeling contact list, plain table or List
>
> That's the beauty of MV - Cassandra automatically updates the MVs when the
> base table changes, including deletions, which is why all of the PK columns
> from the base table needed to be in the MV PK.
>
> -- Jack Krupansky
>
> On Mon, Jan 11, 2016 at 3:41 PM, I PVP  wrote:
>
>> The below table and materialized view will solve the SELECT requirements
>> of my current application .
>> The challenge now is when the user decides to DELETE one specific contact
>> from his contact list. I could add the objectid to a composite partition
>> key together with the userid. But that would make the SELECT inviable.
>>
>>  Any ideas/suggestions?
>>
>>
>> CREATE TABLE communication.user_contact (
>> userid int,
>> contactname text,
>> contactid int,
>> createdat timeuuid,
>> favoriteat timestamp,
>> isfavorite boolean,
>> objectid timeuuid,
>> PRIMARY KEY (userid, contactname)
>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>>
>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>> SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat,
>> objectid
>> FROM user_contact
>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>> NOT NULL
>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>> Thanks
>>
>> --
>> IPVP
>>
>>
>> From: DuyHai Doan  
>> Reply: user@cassandra.apache.org >
>> 
>> Date: January 11, 2016 at 11:14:10 AM
>>
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> In the current iteration of materialized view, it is still not possible
>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>> won't work.
>>
>> Still there is a JIRA created to support this feature :
>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>
>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>> case because the OP said "Less then one hundred contacts by user is the
>> normal."
>>
>> So even if all contacts are stuck in one unique favorite state, the
>> materialized view partition for one user is at most 100. Even for extreme
>> edge case with users having 10 000 contacts, it's still a manageable
>> partition size for C*.
>>
>> But I agree it is important to know before-hand the favorite/non-favorite
>> update frequency since it will impact the write throughput on the MV.
>>
>> For more details on materialized view impl and performance:
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky > > wrote:
>>
>>> The new Materialized View feature is just an automated way of creating
>>> and maintaining what people used to call a "query table", which is the
>>> traditional Cassandra data modeling technique for performing queries on on
>>> than the primary key for a table - you store the same columns in different
>>> tables using different columns for the primary key.
>>>
>>> One also needs to be careful to include all columns of the original
>>> primary key in each MV primary key - in addition to whatever column(s) are
>>> to be used for indexing in each MV (so that Cassandra can find the old row
>>> when it needs to update the MV when the base table row changes, such as on
>>> a deletion.)
>>>
>>> But before creating MVs, you first need to answer questions about how
>>> the app needs to query 

Re: Modeling contact list, plain table or List

2016-01-11 Thread Jonathan Haddad
In general I advise people avoid lists and use Maps or Sets instead.

Using this data model, for instance, it's easy to remove a specific Address
from a user:

CREATE TYPE address (
  street text,
  city text,
  zip_code int,
);

CREATE TABLE user (
user_id int primary key,
addresses map
);

When I want to remove one of the addresses from a user, I can do this:

cqlsh:test> delete addresses['home'] from user where user_id =  1;


Hope that helps,
Jon


On Mon, Jan 11, 2016 at 1:20 PM I PVP  wrote:

> Well…the way it is now  it is not possible to delete a specific contact
> row from the base table at all. Because a DELETE statement only works with
>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
> clausule.
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>
> The way it is now  It is only possible to delete the entire contact list
>  for that specific user.
> Looks like will need to:
> 1)SELECT all rows from user_contact excluding the one  that the user wants
> to get rid of.
> 2) DELETE all the user_contact rows  for that particular user .
> 3) INSERT  the result of 1).
>
> Is that the proper way to achieve it or Am I missing some point in the
> modeling that would allow to delete a specific contact row  and still able
> to comply with the select requirements?
>
> Thanks
> --
> IPVP
>
>
> From: Jack Krupansky  
> Reply: user@cassandra.apache.org >
> 
> Date: January 11, 2016 at 7:00:04 PM
>
> To: user@cassandra.apache.org >
> 
> Subject:  Re: Modeling contact list, plain table or List
>
> That's the beauty of MV - Cassandra automatically updates the MVs when the
> base table changes, including deletions, which is why all of the PK columns
> from the base table needed to be in the MV PK.
>
> -- Jack Krupansky
>
> On Mon, Jan 11, 2016 at 3:41 PM, I PVP  wrote:
>
>> The below table and materialized view will solve the SELECT requirements
>> of my current application .
>> The challenge now is when the user decides to DELETE one specific contact
>> from his contact list. I could add the objectid to a composite partition
>> key together with the userid. But that would make the SELECT inviable.
>>
>>  Any ideas/suggestions?
>>
>>
>> CREATE TABLE communication.user_contact (
>> userid int,
>> contactname text,
>> contactid int,
>> createdat timeuuid,
>> favoriteat timestamp,
>> isfavorite boolean,
>> objectid timeuuid,
>> PRIMARY KEY (userid, contactname)
>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>>
>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>> SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat,
>> objectid
>> FROM user_contact
>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>> NOT NULL
>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>> Thanks
>>
>> --
>> IPVP
>>
>>
>> From: DuyHai Doan  
>> Reply: user@cassandra.apache.org >
>> 
>> Date: January 11, 2016 at 11:14:10 AM
>>
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> In the current iteration of materialized view, it is still not possible
>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>> won't work.
>>
>> Still there is a JIRA created to support this feature :
>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>
>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>> case because the OP said "Less then one hundred contacts by user is the
>> normal."
>>
>> So even if all contacts are stuck in one unique favorite state, the
>> materialized view partition for one user is at most 100. Even for extreme
>> edge case with users having 10 000 contacts, it's still a manageable
>> partition size for C*.
>>
>> But I agree it is important to know before-hand the favorite/non-favorite
>> update frequency since it will impact the write throughput on the MV.
>>
>> For more details on materialized view impl and performance:
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky > > wrote:
>>
>>> The new Materialized View feature is just an automated way of creating
>>> and maintaining what people used to call a "query table", which is the
>>> traditional Cassandra data modeling technique for performing queries on on
>>> than the primary key for a table - you store the same columns in different
>>> tables using different columns for the primary key.
>>>
>>> One also needs to be careful to include all columns of the original
>>> primary key in each 

Re: Modeling contact list, plain table or List

2016-01-11 Thread I PVP
The below table and materialized view will solve the SELECT requirements of my 
current application .
The challenge now is when the user decides to DELETE one specific contact from 
his contact list. I could add the objectid to a composite partition key 
together with the userid. But that would make the SELECT inviable.

 Any ideas/suggestions?


CREATE TABLE communication.user_contact (
userid int,
contactname text,
contactid int,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
objectid timeuuid,
PRIMARY KEY (userid, contactname)
) WITH CLUSTERING ORDER BY ( contactname DESC )


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat, 
objectid
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), contactname )
WITH CLUSTERING ORDER BY ( contactname DESC )

Thanks

--
IPVP


From: DuyHai Doan 
Reply: user@cassandra.apache.org 
>
Date: January 11, 2016 at 11:14:10 AM
To: user@cassandra.apache.org 
>
Subject:  Re: Modeling contact list, plain table or List

In the current iteration of materialized view, it is still not possible to have 
WHERE clause other than IS NOT NULL so is_favourite IS TRUE won't work.

Still there is a JIRA created to support this feature : 
https://issues.apache.org/jira/browse/CASSANDRA-10368

About cardinality of favorite vs non-favorites, it doesn't matter in 
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif] this case 
because the OP said "Less then one hundred contacts by user is the normal."

So even if all contacts are stuck in one unique favorite state, the 
materialized view partition for one user is at most 100. Even for extreme edge 
case with users having 10 000 contacts, it's still a manageable partition size 
for C*.

But I agree it is important to know before-hand the favorite/non-favorite 
update frequency since it will impact the write throughput on the MV.

For more details on materialized view impl and performance: 
http://www.doanduyhai.com/blog/?p=1930

On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky 
> wrote:
The new Materialized View feature is just an automated way of creating and 
maintaining what people used to call a "query table", which is the traditional 
Cassandra data modeling technique for performing queries on on than the primary 
key for a table - you store the same columns in different tables using 
different columns for the primary key.

One also needs to be careful to include all columns of the original primary key 
in each MV primary key - in addition to whatever column(s) are to be used for 
indexing in each MV (so that Cassandra can find the old row when it needs to 
update the MV when the base table row changes, such as on a deletion.)

But before creating MVs, you first need to answer questions about how the app 
needs to query the data. Even with MV, conceptualizing queries needs to precede 
data modeling.

For example, what is the cardinality of favorites vs. non-favorites, does the 
app even need to query by favorates, as opposed to querying all contacts and 
retrieving is_favorite as simply a non-key column value, whether favorites need 
to be retrieved separately from non-favorites, the frequency and latency 
requirements for query by favorite status, etc. Once these questions are 
answered, decisions can be made about data modeling.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso 
> wrote:
I have never used Materialized Views so maybe this suggestion is not possible, 
but in this case, wouldn't it make sense to define the materialized view as

is_favourite IS TRUE
instead of
is_favourite IS NOT NULL?

Carlos Alonso | Software Engineer | @calonso

On 10 January 2016 at 09:59, DuyHai Doan 
> wrote:
Try this

CREATE TABLE communication.user_contact_list (
  user_id uuid,
  contact_id uuid,
  contact_name text,
  created_at timeuuid,
  is_favorite boolean,
  favorite_at timestamp,
  PRIMARY KEY (user_id, contact_name, contact_id)
);

CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
AS SELECT * FROM communication.user_contact_list
WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)

If the flag is_favorite is not updated very often the write perf hit due to 
materialized view is acceptable.

On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. 
> wrote:
Jack/ Michael,

Thanks for answering.

How big?: Less then one 

Re: Modeling contact list, plain table or List

2016-01-11 Thread Jack Krupansky
That's the beauty of MV - Cassandra automatically updates the MVs when the
base table changes, including deletions, which is why all of the PK columns
from the base table needed to be in the MV PK.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 3:41 PM, I PVP  wrote:

> The below table and materialized view will solve the SELECT requirements
> of my current application .
> The challenge now is when the user decides to DELETE one specific contact
> from his contact list. I could add the objectid to a composite partition
> key together with the userid. But that would make the SELECT inviable.
>
>  Any ideas/suggestions?
>
>
> CREATE TABLE communication.user_contact (
> userid int,
> contactname text,
> contactid int,
> createdat timeuuid,
> favoriteat timestamp,
> isfavorite boolean,
> objectid timeuuid,
> PRIMARY KEY (userid, contactname)
> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>
>
> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
> SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat,
> objectid
> FROM user_contact
> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS NOT
> NULL
> PRIMARY KEY ( ( userid, isfavorite ), contactname )
> WITH CLUSTERING ORDER BY ( contactname DESC )
>
> Thanks
>
> --
> IPVP
>
>
> From: DuyHai Doan  
> Reply: user@cassandra.apache.org >
> 
> Date: January 11, 2016 at 11:14:10 AM
>
> To: user@cassandra.apache.org >
> 
> Subject:  Re: Modeling contact list, plain table or List
>
> In the current iteration of materialized view, it is still not possible to
> have WHERE clause other than IS NOT NULL so is_favourite IS TRUE won't
> work.
>
> Still there is a JIRA created to support this feature :
> https://issues.apache.org/jira/browse/CASSANDRA-10368
>
> About cardinality of favorite vs non-favorites, it doesn't matter in this
> case because the OP said "Less then one hundred contacts by user is the
> normal."
>
> So even if all contacts are stuck in one unique favorite state, the
> materialized view partition for one user is at most 100. Even for extreme
> edge case with users having 10 000 contacts, it's still a manageable
> partition size for C*.
>
> But I agree it is important to know before-hand the favorite/non-favorite
> update frequency since it will impact the write throughput on the MV.
>
> For more details on materialized view impl and performance:
> http://www.doanduyhai.com/blog/?p=1930
>
> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky 
> wrote:
>
>> The new Materialized View feature is just an automated way of creating
>> and maintaining what people used to call a "query table", which is the
>> traditional Cassandra data modeling technique for performing queries on on
>> than the primary key for a table - you store the same columns in different
>> tables using different columns for the primary key.
>>
>> One also needs to be careful to include all columns of the original
>> primary key in each MV primary key - in addition to whatever column(s) are
>> to be used for indexing in each MV (so that Cassandra can find the old row
>> when it needs to update the MV when the base table row changes, such as on
>> a deletion.)
>>
>> But before creating MVs, you first need to answer questions about how the
>> app needs to query the data. Even with MV, conceptualizing queries needs to
>> precede data modeling.
>>
>> For example, what is the cardinality of favorites vs. non-favorites, does
>> the app even need to query by favorates, as opposed to querying all
>> contacts and retrieving is_favorite as simply a non-key column value,
>> whether favorites need to be retrieved separately from non-favorites, the
>> frequency and latency requirements for query by favorite status, etc. Once
>> these questions are answered, decisions can be made about data modeling.
>>
>> -- Jack Krupansky
>>
>> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso 
>> wrote:
>>
>>> I have never used Materialized Views so maybe this suggestion is not
>>> possible, but in this case, wouldn't it make sense to define the
>>> materialized view as
>>>
>>> is_favourite IS TRUE
>>> instead of
>>> is_favourite IS NOT NULL?
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 10 January 2016 at 09:59, DuyHai Doan  wrote:
>>>
 Try this

 CREATE TABLE communication.user_contact_list (
   user_id uuid,
   contact_id uuid,
   contact_name text,
   created_at timeuuid,
   is_favorite boolean,
   favorite_at timestamp,
   PRIMARY KEY (user_id, contact_name, contact_id)
 );

 CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
 AS SELECT * FROM communication.user_contact_list
 WHERE user_id IS NOT NULL AND 

Re: How do I upgrade from 2.0.16 to 2.0.17 in my case????

2016-01-11 Thread Vasileios Vlachos
Thanks Michael,

I'll try that then. I need to figure out how to do it with Ubuntu's upstart
because I've not done it before.
On 7 Jan 2016 4:25 pm, "Michael Shuler"  wrote:

> On 01/07/2016 07:52 AM, Vasileios Vlachos wrote:
> > Hello,
> >
> > My problem is described CASSANDRA-10872
> > . I upgraded a
> > second node on the same cluster in case there was something special with
> > the first node but I experienced identical behaviour. Both
> > cassandra-env.sh and cassandra-rackdc.properties were replaced
> > causing the node to come up in the default data centre DC1.
> >
> > What is the best way to upgrade to 2.0.17 in a safe manner in this case?
> > How do we work around this?
>
> I've made a bit of headway on this, but don't have this automated in CI
> fully, yet. In quick tests, I get prompted on upgrade when my config
> files have changed from the originals, similar to your later comment on
> that JIRA. This replacement without prompt could be a system
> configuration to not prompt you(?). I'm not sure how one would change
> that behavior system-wide, since I've never turned this knob, but I'd
> suggest looking at debconf options.
>
> I'm in favor of CASSANDRA-2356, and with the beginning of tick-tock
> releases, this is a good time to get this in as a new feature. As for
> configuring your existing system to not restart services on upgrade, see
> https://people.debian.org/~hmh/invokerc.d-policyrc.d-specification.txt
> for setting up a local policy to behave as you wish.
>
> --
> Michael
>


Re: Using CCM with Opscenter and manual agent installation

2016-01-11 Thread Giampaolo Trapasso
> I believe the issue is just jmx_host needing to be set to 'localhost'
Yes, that solved. Thanks!

giampaolo


2016-01-08 5:17 GMT+01:00 Nick Bailey :

> stomp_interface is the address to connect back to the central OpsCenter
> daemon with, so 127.0.0.1 should be correct. I believe the issue is just
> jmx_host needing to be set to 'localhost'
>
> On Thu, Jan 7, 2016 at 8:50 PM, Michael Shuler 
> wrote:
>
>> On 01/07/2016 08:46 PM, Michael Shuler wrote:
>> > I'm not sure exactly what that service is, but if all 4 nodes (which are
>> > all really localhost aliases) are attempting to bind to the same IP:port
>> > for that stomp connection, they could be stepping on one another. Should
>> > those be 127.0.0.1 for node1, 127.0.0.12 for node2, etc.?
>>
>> Since accurate typing is eluding me..
>>
>> Should the stomp connection be 127.0.0.1 for node1, 127.0.0.2 for node2,
>> 127.0.0.3 for node3, 127.0.0.4 for node4?
>>
>> --
>> :)
>> Michael
>>
>
>


Re: Modeling contact list, plain table or List

2016-01-11 Thread DuyHai Doan
In the current iteration of materialized view, it is still not possible to
have WHERE clause other than IS NOT NULL so is_favourite IS TRUE won't work.

Still there is a JIRA created to support this feature :
https://issues.apache.org/jira/browse/CASSANDRA-10368

About cardinality of favorite vs non-favorites, it doesn't matter in this
case because the OP said "Less then one hundred contacts by user is the
normal."

So even if all contacts are stuck in one unique favorite state, the
materialized view partition for one user is at most 100. Even for extreme
edge case with users having 10 000 contacts, it's still a manageable
partition size for C*.

But I agree it is important to know before-hand the favorite/non-favorite
update frequency since it will impact the write throughput on the MV.

For more details on materialized view impl and performance:
http://www.doanduyhai.com/blog/?p=1930

On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky 
wrote:

> The new Materialized View feature is just an automated way of creating and
> maintaining what people used to call a "query table", which is the
> traditional Cassandra data modeling technique for performing queries on on
> than the primary key for a table - you store the same columns in different
> tables using different columns for the primary key.
>
> One also needs to be careful to include all columns of the original
> primary key in each MV primary key - in addition to whatever column(s) are
> to be used for indexing in each MV (so that Cassandra can find the old row
> when it needs to update the MV when the base table row changes, such as on
> a deletion.)
>
> But before creating MVs, you first need to answer questions about how the
> app needs to query the data. Even with MV, conceptualizing queries needs to
> precede data modeling.
>
> For example, what is the cardinality of favorites vs. non-favorites, does
> the app even need to query by favorates, as opposed to querying all
> contacts and retrieving is_favorite as simply a non-key column value,
> whether favorites need to be retrieved separately from non-favorites, the
> frequency and latency requirements for query by favorite status, etc. Once
> these questions are answered, decisions can be made about data modeling.
>
> -- Jack Krupansky
>
> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso  wrote:
>
>> I have never used Materialized Views so maybe this suggestion is not
>> possible, but in this case, wouldn't it make sense to define the
>> materialized view as
>>
>> is_favourite IS TRUE
>> instead of
>> is_favourite IS NOT NULL?
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 10 January 2016 at 09:59, DuyHai Doan  wrote:
>>
>>> Try this
>>>
>>> CREATE TABLE communication.user_contact_list (
>>>   user_id uuid,
>>>   contact_id uuid,
>>>   contact_name text,
>>>   created_at timeuuid,
>>>   is_favorite boolean,
>>>   favorite_at timestamp,
>>>   PRIMARY KEY (user_id, contact_name, contact_id)
>>> );
>>>
>>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
>>> AS SELECT * FROM communication.user_contact_list
>>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
>>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
>>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>>>
>>> If the flag is_favorite is not updated very often the write perf hit due
>>> to materialized view is acceptable.
>>>
>>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P.  wrote:
>>>
 Jack/ Michael,

 Thanks for answering.

 How big?: Less then one hundred contacts by user is the normal.

 Update requirements: The UPDATE requirements are all around  each user
 “favoriting/unfavoriting” the contacts . Deleting is not very frequent.

 Does that mean that in C* 3.02 , for this use case to work, the contact
 name  must be part of a  composite partition key in order to allow sorting
 by contact_name like this ? :

 CREATE TABLE communication.user_contact_list (
 user_id uuid,
 contact_name text,
 is_favorite boolean,
 contact_id uuid,
 created_at timeuuid,
 favorite_at timestamp,
 PRIMARY KEY ((user_id, contact_name), is_favorite)
 )  WITH CLUSTERING ORDER BY (contact_name ASC);

 Query: Select * from user_contact_list where user_id = :userid and
 is_favorite = true order by contact_name asc;

 Looks like each contact as a row/clustering key will be the way to go.

 Thanks

 IPVP


 From: Laing, Michael 
 
 Reply: user@cassandra.apache.org >
 
 Date: January 9, 2016 at 11:51:27 AM
 To: user@cassandra.apache.org >
 
 Subject:  Re: Modeling contact list, plain table 

Re: Modeling contact list, plain table or List

2016-01-11 Thread Jack Krupansky
The new Materialized View feature is just an automated way of creating and
maintaining what people used to call a "query table", which is the
traditional Cassandra data modeling technique for performing queries on on
than the primary key for a table - you store the same columns in different
tables using different columns for the primary key.

One also needs to be careful to include all columns of the original primary
key in each MV primary key - in addition to whatever column(s) are to be
used for indexing in each MV (so that Cassandra can find the old row when
it needs to update the MV when the base table row changes, such as on a
deletion.)

But before creating MVs, you first need to answer questions about how the
app needs to query the data. Even with MV, conceptualizing queries needs to
precede data modeling.

For example, what is the cardinality of favorites vs. non-favorites, does
the app even need to query by favorates, as opposed to querying all
contacts and retrieving is_favorite as simply a non-key column value,
whether favorites need to be retrieved separately from non-favorites, the
frequency and latency requirements for query by favorite status, etc. Once
these questions are answered, decisions can be made about data modeling.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso  wrote:

> I have never used Materialized Views so maybe this suggestion is not
> possible, but in this case, wouldn't it make sense to define the
> materialized view as
>
> is_favourite IS TRUE
> instead of
> is_favourite IS NOT NULL?
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 10 January 2016 at 09:59, DuyHai Doan  wrote:
>
>> Try this
>>
>> CREATE TABLE communication.user_contact_list (
>>   user_id uuid,
>>   contact_id uuid,
>>   contact_name text,
>>   created_at timeuuid,
>>   is_favorite boolean,
>>   favorite_at timestamp,
>>   PRIMARY KEY (user_id, contact_name, contact_id)
>> );
>>
>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
>> AS SELECT * FROM communication.user_contact_list
>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>>
>> If the flag is_favorite is not updated very often the write perf hit due
>> to materialized view is acceptable.
>>
>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P.  wrote:
>>
>>> Jack/ Michael,
>>>
>>> Thanks for answering.
>>>
>>> How big?: Less then one hundred contacts by user is the normal.
>>>
>>> Update requirements: The UPDATE requirements are all around  each user
>>> “favoriting/unfavoriting” the contacts . Deleting is not very frequent.
>>>
>>> Does that mean that in C* 3.02 , for this use case to work, the contact
>>> name  must be part of a  composite partition key in order to allow sorting
>>> by contact_name like this ? :
>>>
>>> CREATE TABLE communication.user_contact_list (
>>> user_id uuid,
>>> contact_name text,
>>> is_favorite boolean,
>>> contact_id uuid,
>>> created_at timeuuid,
>>> favorite_at timestamp,
>>> PRIMARY KEY ((user_id, contact_name), is_favorite)
>>> )  WITH CLUSTERING ORDER BY (contact_name ASC);
>>>
>>> Query: Select * from user_contact_list where user_id = :userid and
>>> is_favorite = true order by contact_name asc;
>>>
>>> Looks like each contact as a row/clustering key will be the way to go.
>>>
>>> Thanks
>>>
>>> IPVP
>>>
>>>
>>> From: Laing, Michael 
>>> 
>>> Reply: user@cassandra.apache.org >
>>> 
>>> Date: January 9, 2016 at 11:51:27 AM
>>> To: user@cassandra.apache.org >
>>> 
>>> Subject:  Re: Modeling contact list, plain table or List
>>>
>>> Note that in C* 3.02 the second query is invalid:
>>>
>>> cqlsh> Select * from communication.user_contact_list where user_id =
>>> 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order
>>> by contact_name asc;
>>>
>>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
>>> "is_favorite" cannot be restricted as preceding column "contact_name" is
>>> not restricted"*
>>>
>>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky >> > wrote:
>>>
 How big is each contact list expected to be? Dozens? Hundreds?
 Thousands? If just dozens, a simple list column would seem sufficient. If
 thousands, the row (not partition) would get kind of bloated.

 What requirements do you have for updating? If updating contacts and
 lots of contacts, I think I'd prefer each contact as a row/clustering key.
 Nice to be able to do selective queries to return slices of the clustering
 key values, which is not so easy if they are all just a single list column.

 -- Jack Krupansky

 On Fri, Jan 8, 2016 at 

sstableloader throughput

2016-01-11 Thread Noorul Islam K M

I have a need to stream data to new cluster using sstableloader. I
spawned a machine with 32 cores assuming that sstableloader scaled with
respect to cores. But it doesn't look like so.

I am getting an average throughput of 18 MB/s which seems to be pretty
low (I might be wrong).

Is there any way to increase the throughput. OpsCenter data on target
cluster shows very less write requests / second.

Thanks and Regards
Noorul


Re: sstableloader throughput

2016-01-11 Thread Jeff Jirsa

Make sure streaming throughput isn’t throttled on the destination cluster. 

Stream from more machines (divide sstables between a bunch of machines, run in 
parallel).







On 1/11/16, 5:21 AM, "Noorul Islam K M"  wrote:

>
>I have a need to stream data to new cluster using sstableloader. I
>spawned a machine with 32 cores assuming that sstableloader scaled with
>respect to cores. But it doesn't look like so.
>
>I am getting an average throughput of 18 MB/s which seems to be pretty
>low (I might be wrong).
>
>Is there any way to increase the throughput. OpsCenter data on target
>cluster shows very less write requests / second.
>
>Thanks and Regards
>Noorul

smime.p7s
Description: S/MIME cryptographic signature


Cassandra 1.2 & Compressed Data

2016-01-11 Thread Ken Hancock
We were running a contrived system test last week trying to measure the
effect that compaction was having on our I/O and read performance.  As a
test, we set compaction throughput to 1MB/sec.

As expected, we fell greatly behind and the number of SSTables grew.
Unexpectedly, we went OOM.

One of my CFs had 1127 SSTables and those SSTables had a Retained Heap of
almost 1GB.  This was after stopping both compaction and all reads and
writes as well as a executing a full GC.

Here's the heap dump summary for a single CF:

Class Name| Objects |
Shallow Heap |  Retained Heap

org.apache.cassandra.io.sstable.SSTableReader |   1,127
|  117,208 | >= 985,675,936
|- org.apache.cassandra.io.sstable.SSTableMetadata|   1,127
|   63,112 |   >= 7,284,600
|- java.util.concurrent.atomic.AtomicLong |   2,254
|   54,096 |  >= 54,096
|- org.apache.cassandra.db.DecoratedKey   |   2,254
|   54,096 | >= 378,672
|- org.apache.cassandra.io.sstable.SSTableDeletingTask|   1,127
|   45,080 |  >= 45,080
|- org.apache.cassandra.io.util.CompressedPoolingSegmentedFile|   1,127
|   45,080 | >= 969,094,776
|- org.apache.cassandra.io.sstable.Descriptor |   1,127
|   45,080 | >= 483,696
|- org.apache.cassandra.io.sstable.BloomFilterTracker |   1,127
|   45,080 |  >= 99,176
|- org.apache.cassandra.io.util.MmappedSegmentedFile  |   1,127
|   45,080 | >= 360,640
|- java.util.concurrent.atomic.AtomicBoolean  |   2,254
|   36,064 |  >= 36,064
|- org.apache.cassandra.utils.Murmur3BloomFilter  |   1,127
|   27,048 |  >= 81,144
|- org.apache.cassandra.io.sstable.IndexSummary   |   1,127
|   27,048 |   >= 7,896,104
|- java.util.concurrent.CopyOnWriteArraySet   |   1,127
|   18,032 | >= 153,272
|- java.util.concurrent.atomic.AtomicInteger  |   1,127
|   18,032 |  >= 18,032
|- org.apache.cassandra.config.CFMetaData |   1
|  120 |  1,608
|- org.apache.cassandra.cache.AutoSavingCache |   1
|   40 | 56
|- java.lang.Class|   1
|   16 | 16
|- org.apache.cassandra.dht.Murmur3Partitioner|   1
|   16 | 32


The retained heap is all in the io.util.CompressedPoolingSegmentedFile.
Specifically, it is all used up by
io.compress.CompressedRandomAccessReader's.compressed ByteBuffer.

I'm not familiar with the cassandra source code, but here's how I'm reading
it.  A SSTable is segmented and a ConcurrentLinkedQueue (appears unbounded)
is created which will contain a Reader for each segment.  Since this table
is compressed, each segment has a
io.compress.CompressedRandomAccessReader.  CompressedRandomAccessReader
allocates an on-heap ByteBuffer, buffer, to receive decompressed data.

It appears, this buffer is only released when the SSTable is closed, i.e.
when it's compressed or cassandra shuts down.

In our case, we had a contrived test where compression was essentially
disabled.  However, if I have a huge table which will not get compressed
for weeks (STCS), it seems that for each segment Cassandra will allocate a
CompressedRandomAccessReader which will allocate a 65K decompression buffer
for each segment that is read and those will never get freed and are
unbounded.  My reading is the memory requirements in Cassandra 1.2.18 for
compressed data become unbounded and can consume as much heap space as
compressed data is read.

Seaching Jira, I found https://issues.apache.org/jira/browse/CASSANDRA-5661
which sounds like the fix effectively orphaned Cassandra 1.2:

"Reader pooling was introduced in CASSANDRA-4942
 but pooled
RandomAccessReaders are never cleaned up until the SSTableReader is closed.
So memory use is "the worst case simultaneous RAR we had open for this
file, forever."

We should introduce a global limit on how much memory to use for RAR, and
evict old ones."

I'm not clear how the "simultaneous" comment above applies.  If I'm reading
this correctly, STCS and compressed data is a ticking timebomb for
Cassandra 1.2.
Hopefully someone with more knowledge of the source code can let me know if
my analysis is correct.


Re: Too many compactions, maybe keyspace system?

2016-01-11 Thread Robert Coli
On Sat, Jan 9, 2016 at 8:23 AM, Shuo Chen  wrote:

> I don't know what exactly compaction logs is like in system.log. But I see
> logs like this in system.log, I think maybe this is the compaction log
>

grep -i compact /path/to/system.log

=Rob


Recommendations for an embedded Cassandra and Unit Tests

2016-01-11 Thread Richard L. Burton III
I'm looking to see what's recommended for an embedded version of Cassandra,
just for unit testing.

I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
wanted to see if there's was a better recommendation?

-- 
-Richard L. Burton III
@rburton


Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-11 Thread DuyHai Doan
Achilles 4.x does offer an embedded Cassandra server support with some
utility classes like ScriptExecutor. It supports C* 2.2 currently :

https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
Le 11 janv. 2016 20:47, "Richard L. Burton III"  a
écrit :

> I'm looking to see what's recommended for an embedded version of
> Cassandra, just for unit testing.
>
> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
> wanted to see if there's was a better recommendation?
>
> --
> -Richard L. Burton III
> @rburton
>