Gradle script to execute cql3 scripts

2013-09-03 Thread dawood abdullah
I have a requirement to execute CQL3 scripts through Gradle, do we have any
cassandra plugin for Gradle to do the same or is there any other way I can
execute CQL3 scripts during the build itself. Please suggest.

Dawood


read ?

2013-09-03 Thread Langston, Jim
Hi all,

Quick question

I currently am looking at a 4 node cluster and I have currently stopped all 
writing to
Cassandra,  with the reads continuing. I'm trying to understand the utilization
of memory within the JVM. nodetool info on each of the nodes shows them all
growing in footprint, 2 of the three at a greater rate. On the restart of 
Cassandra
each were at about 100MB, after 2 days, each of the following are at:

Heap Memory (MB) : 798.41 / 3052.00

Heap Memory (MB) : 370.44 / 3052.00

Heap Memory (MB) : 549.73 / 3052.00

Heap Memory (MB) : 481.89 / 3052.00

Ring configuration:

Address RackStatus State   LoadOwns
Token
   
127605887595351923798765477786913079296
x 1d  Up Normal  4.38 GB 25.00%  0
x   1d  Up Normal  4.17 GB 25.00%  
42535295865117307932921825928971026432
x   1d  Up Normal  4.19 GB 25.00%  
85070591730234615865843651857942052864
x   1d  Up Normal  4.14 GB 25.00%  
127605887595351923798765477786913079296


What I'm not sure of is what the growth is different between each ? and why
that growth is being created by activity that is read only.

Is Cassandra caching and holding the read data ?

I currently have caching turned off for the key/row. Also as part of the info 
command

Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 14400 save period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 0 save period in seconds



Thanks,

Jim


[RELEASE] Apache Cassandra 2.0 released

2013-09-03 Thread Sylvain Lebresne
The Cassandra team is very pleased to announce the release of Apache
Cassandra
version 2.0.0. Cassandra 2.0.0 is a new major release that adds numerous
improvements[1,2], including:
  - Lightweight transactions[4] that offers linearizable consistency.
  - Experimental Triggers Support[5].
  - Numerous enhancements to CQL as well as a new and better version of the
native protocol[6].
  - Compaction improvements[7] (including a hybrid strategy that combines
leveled and size-tiered compaction).
  - A new faster Thrift Server implementation based on LMAX Disruptor[8].
  - Eager retries: avoids query timeout by sending data requests to other
replicas if too much time passes on the original request.

See the full changelog[1] for more and please make sure to check the release
notes[2] for upgrading details.

Both source and binary distributions of Cassandra 2.0.0 can be downloaded
at:

 http://cassandra.apache.org/download/

As usual, a debian package is available from the project APT repository[3]
(you will need to use the 20x series).

The Cassandra team

[1]: http://goo.gl/zU4sWv (CHANGES.txt)
[2]: http://goo.gl/MrR6Qn (NEWS.txt)
[3]: http://wiki.apache.org/cassandra/DebianPackaging
[4]:
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
[5]:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
[6]: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
[7]: https://issues.apache.org/jira/browse/CASSANDRA-5371
[8]: https://issues.apache.org/jira/browse/CASSANDRA-5582


Re: CqlStorage creates wrong schema for Pig

2013-09-03 Thread Chad Johnston
You're trying to use FromCqlColumn on a tuple that has been flattened. The
schema still thinks it's {title: chararray}, but the flattened tuple is now
two values. I don't know how to retrieve the data values in this case.

Your code will work correctly if you do this:
*values3 = FOREACH rows GENERATE FromCqlColumn(title) AS title;*
*dump values3;*
*describe values3;*

(Use FromCqlColumn on the original data, not the flattened data.)

Chad


On Mon, Sep 2, 2013 at 8:45 AM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 Hi


 1.-

 May be?

 -- Register the UDF
 REGISTER /path/to/cqlstorageudf-1.0-SNAPSHOT

 -- FromCqlColumn will convert chararray, int, long, float, double
 DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();

 -- Load data as normal
 data_raw = LOAD 'cql://bookcrossing/books' USING CqlStorage();

 -- Use the UDF
 data = FOREACH data_raw GENERATE
 *FromCqlColumn*(isbn) AS ISBN,
 *FromCqlColumn*(bookauthor) AS BookAuthor,

 *FromCqlColumn*(booktitle) AS BookTitle,
 *FromCqlColumn*(publisher) AS Publisher,

 *FromCqlColumn*(yearofpublication) AS YearOfPublication;





 and  2.:

 with  the data in cql cassandra 1.2.8, pig 0.11.11 and cql3:

 *CREATE KEYSPACE keyspace1*

 *  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor'
 : 1 }*

 *  AND durable_writes = true;*

 *
 *

 *use keyspace2;*

 *
 *

 *  CREATE TABLE test (*

 *id text PRIMARY KEY,*

 *title text,*

 *age int*

 *  )  WITH COMPACT STORAGE;*

 *
 *

 *
 *

 *  insert into test (id, title, age) values('1', 'child', 21);*

 *  insert into test (id, title, age) values('2', 'support', 21);*

 *  insert into test (id, title, age) values('3', 'manager', 31);*

 *  insert into test (id, title, age) values('4', 'QA', 41);*

 *  insert into test (id, title, age) values('5', 'QA', 30);*

 *  insert into test (id, title, age) values('6', 'QA', 30);*





 and script:

 *
 *
 *register './libs/cqlstorageudf-1.0-SNAPSHOT.jar';*
 *DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();*
 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage();*
 *dump rows;*
 *ILLUSTRATE rows;*
 *describe rows;*
 *A = FOREACH rows GENERATE FLATTEN(title);*
 *dump A;*
 *values3 = FOREACH A GENERATE FromCqlColumn(title) AS title;*
 *dump values3;*
 *describe values3;*


 --



 I have this error:




 

 -
 | rows | id:chararray   | age:int   | title:chararray   |
 -
 |  | (id, 5)| (age, 30) | (title, QA)   |
 -

 rows: {id: chararray,age: int,title: chararray}


 ...

 (title,QA)
 (title,QA)
 ..
 2013-09-02 16:40:52,454 [Thread-11] WARN
  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
 *java.lang.ClassCastException: java.lang.String cannot be cast to
 org.apache.pig.data.Tuple*
 at com.megatome.pig.piggybank.tuple.ColumnBase.exec(ColumnBase.java:32)
  at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
  at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
  at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
  at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
  at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
  at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
  at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 2013-09-02 16:40:52,832 [main] INFO
  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - HadoopJobId: job_local_0003



 8-|

 Regards

 ...


 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com



 2013/9/2 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi all:

 More info :

 https://issues.apache.org/jira/browse/CASSANDRA-5941



 I tried this (and gen. cassandra 1.2.9)  but do not work for me,

 git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
 cd cassandra
 git checkout cassandra-1.2
 patch -p1  

Re: row cache

2013-09-03 Thread Chris Burroughs

On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:

Yes, that is correct.

The SerializingCacheProvider stores row cache contents off heap. I believe you
need JNA enabled for this though. Someone please correct me if I am wrong here.

The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
itself.



Naming things is hard.  Both caches are in memory and are backed by a 
ConcurrentLinkekHashMap.  In the case of the SerializingCacheProvider 
the *values* are stored in off heap buffers.  Both must store a half 
dozen or so objects (on heap) per entry 
(org.apache.cassandra.cache.RowCacheKey, 
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue, 
java.util.concurrent.ConcurrentHashMap$HashEntry, etc).  It would 
probably be better to call this a mixed-heap rather than off-heap 
cache.  You may find the number of entires you can hold without gc 
problems to be surprising low (relative to say memcached, or physical 
memory on modern hardware).


Invalidating a column with SerializingCacheProvider invalidates the 
entire row while with ConcurrentLinkedHashCacheProvider it does not. 
SerializingCacheProvider does not require JNA.


Both also use memory estimation of the size (of the values only) to 
determine the total number of entries retained.  Estimating the size of 
the totally on-heap ConcurrentLinkedHashCacheProvider has historically 
been dicey since we switched from sizing in entries, and it has been 
removed in 2.0.0.


As said elsewhere in this thread the utility of the row cache varies 
from absolutely essential to source of numerous problems depending 
on the specifics of the data model and request distribution.





RE: read ?

2013-09-03 Thread Lohfink, Chris
To get an accurate picture you should force a full GC on each node, the heap 
utilization can be misleading since there can be a lot of things in the heap 
with no strong references.

There is a number of factors that can lead to this.  For a true comparison I 
would recommend using jconsole and call dumpHeap on 
com.sun.management:type=HotSpotDiagnostic with the 2nd param true (force GC).  
Then open the heap dump up in a tool like yourkit and you will get a better 
comparison and also it will tell you what it is that's taking the space.

Chris

From: Langston, Jim [mailto:jim.langs...@compuware.com]
Sent: Tuesday, September 03, 2013 8:20 AM
To: user@cassandra.apache.org
Subject: read ?

Hi all,

Quick question

I currently am looking at a 4 node cluster and I have currently stopped all 
writing to
Cassandra,  with the reads continuing. I'm trying to understand the utilization
of memory within the JVM. nodetool info on each of the nodes shows them all
growing in footprint, 2 of the three at a greater rate. On the restart of 
Cassandra
each were at about 100MB, after 2 days, each of the following are at:

Heap Memory (MB) : 798.41 / 3052.00

Heap Memory (MB) : 370.44 / 3052.00

Heap Memory (MB) : 549.73 / 3052.00

Heap Memory (MB) : 481.89 / 3052.00

Ring configuration:

Address RackStatus State   LoadOwns
Token
   
127605887595351923798765477786913079296
x 1d  Up Normal  4.38 GB 25.00%  0
x   1d  Up Normal  4.17 GB 25.00%  
42535295865117307932921825928971026432
x   1d  Up Normal  4.19 GB 25.00%  
85070591730234615865843651857942052864
x   1d  Up Normal  4.14 GB 25.00%  
127605887595351923798765477786913079296


What I'm not sure of is what the growth is different between each ? and why
that growth is being created by activity that is read only.

Is Cassandra caching and holding the read data ?

I currently have caching turned off for the key/row. Also as part of the info 
command

Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 14400 save period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 0 save period in seconds



Thanks,

Jim


Re: Recomended storage choice for Cassandra on Amazon m1.xlarge instance

2013-09-03 Thread Andrey Ilinykh
You benefit from putting commit log on separate drive only if this drive is
an isolated spinning device. EC2 ephemeral is a virtual device, so I don't
think it makes sense to put commit log on a separated drive. I would build
raid0 from 4 drives and put everything their. But it would be interesting
to compare different configurations.

Thank you,
   Andrey


On Mon, Sep 2, 2013 at 7:11 PM, Renat Gilfanov gren...@mail.ru wrote:

 Hello,

 I'd like to ask what is the best options of separating commit log and data
 on Amazon m1.xlarge instance, given 4x420 Gb attached storages and EBS
 volume ?

 As far as I understand, the EBS is not the choice and it's recomended to
 use attached storages instead.
 Is it better to combine 4 ephemeral drives in 2 raid0 (or raid1 ?), and
 store data on the first and commit log on the second? Or may be trying
 other combinations like 1 attached storage for commit log, and 3 others
 grouped in raid0 for data?

 Thank you.





Re: Versioning in cassandra

2013-09-03 Thread dawood abdullah
Jan,

The solution you gave works spot on, but there is one more requirement I
forgot to mention. Following is my table structure

CREATE TABLE file (
  id text,
  contenttype text,
  createdby text,
  createdtime timestamp,
  description text,
  name text,
  parentid text,
  version timestamp,
  PRIMARY KEY (id, version)
) WITH CLUSTERING ORDER BY (version DESC);


The query (select * from file where id = 'xxx' limit 1;) provided solves
the problem of finding the latest version file. But I have one more
requirement of finding all the latest version files having parentid say
'yyy'.

Please suggest how can this query be achieved.

Dawood



On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 





RE: read ?

2013-09-03 Thread Lohfink, Chris
Does it actually OOM eventually? There will be a certain amount of object 
allocation for reads (or anything) which will see the heap creep up until a GC, 
but at ~500mb or so of a 8gb heap there is little reason for the JVM to do it 
so it probably just ignores it to save processing.  Even the young gen wont 
require a collection at this size.

Which version of Cassandra are you running? Previous to 1.2 a lot of metadata 
about the sstables took considerable heap which could cause additional memory 
utilization.

Chris

From: Langston, Jim [mailto:jim.langs...@compuware.com]
Sent: Tuesday, September 03, 2013 11:33 AM
To: user@cassandra.apache.org
Subject: Re: read ?

Thanks Chris,

I have about 8 heap dumps that I have been looking at. I have been trying to 
isolate
as to why I have be dumping heap, I've started by removing the apps that write 
to
cassandra and eliminating work that would entail. I am left with just the apps 
that
are reading the data and from the heap dumps it looks like Cassandra Column 
methods
being called, because there are so many objects, it is difficult to ascertain 
exactly what
the problem may be. That prompted my query, trying to quickly determine if 
Cassandra
holds objects that have been used for reading, and if so, why, and more 
importantly if
something can be done.

Jim

From: Lohfink, Chris chris.lohf...@digi.commailto:chris.lohf...@digi.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 3 Sep 2013 11:12:19 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: read ?

To get an accurate picture you should force a full GC on each node, the heap 
utilization can be misleading since there can be a lot of things in the heap 
with no strong references.

There is a number of factors that can lead to this.  For a true comparison I 
would recommend using jconsole and call dumpHeap on 
com.sun.management:type=HotSpotDiagnostic with the 2nd param true (force GC).  
Then open the heap dump up in a tool like yourkit and you will get a better 
comparison and also it will tell you what it is that's taking the space.

Chris

From: Langston, Jim [mailto:jim.langs...@compuware.com]
Sent: Tuesday, September 03, 2013 8:20 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: read ?

Hi all,

Quick question

I currently am looking at a 4 node cluster and I have currently stopped all 
writing to
Cassandra,  with the reads continuing. I'm trying to understand the utilization
of memory within the JVM. nodetool info on each of the nodes shows them all
growing in footprint, 2 of the three at a greater rate. On the restart of 
Cassandra
each were at about 100MB, after 2 days, each of the following are at:

Heap Memory (MB) : 798.41 / 3052.00

Heap Memory (MB) : 370.44 / 3052.00

Heap Memory (MB) : 549.73 / 3052.00

Heap Memory (MB) : 481.89 / 3052.00

Ring configuration:

Address RackStatus State   LoadOwns
Token
   
127605887595351923798765477786913079296
x 1d  Up Normal  4.38 GB 25.00%  0
x   1d  Up Normal  4.17 GB 25.00%  
42535295865117307932921825928971026432
x   1d  Up Normal  4.19 GB 25.00%  
85070591730234615865843651857942052864
x   1d  Up Normal  4.14 GB 25.00%  
127605887595351923798765477786913079296


What I'm not sure of is what the growth is different between each ? and why
that growth is being created by activity that is read only.

Is Cassandra caching and holding the read data ?

I currently have caching turned off for the key/row. Also as part of the info 
command

Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 14400 save period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 0 save period in seconds



Thanks,

Jim


Re: Upgrade from 1.0.9 to 1.2.8

2013-09-03 Thread Mike Neir
Ah. I was going by the upgrade recommendations in the NEWS.txt file in the 
cassandra source tree, which didn't make mention of that version (1.0.11) 
whatsoever. I didn't see any show-stoppers that would have prevented me from 
going straight from 1.0.9 to 1.2.x.


https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-1.2.4

Looks like a multi-step upgrade is the way I'll be proceeding. Thanks for the 
insight, everyone.


MN

On 09/02/2013 11:04 AM, Jeremiah D Jordan wrote:

1.0.9 - 1.0.12 - 1.1.12 - 1.2.x?


Because this fix in 1.0.11:
* fix 1.0.x node join to mixed version cluster, other nodes = 1.1 
(CASSANDRA-4195)

-Jeremiah


--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator



Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
create secondary index over parentid.
OR
make it part of clustering key

-Vivek


On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 






Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
My bad. I did miss out to read latest version part.

-Vivek


On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah
muhammed.daw...@gmail.comwrote:

 I have tried with both the options creating secondary index and also tried
 adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should
 be incremented. And when I retrieve a row with fname it should return me
 the latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the 
 new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 








RE: map/reduce performance time and sstable readerŠ.

2013-09-03 Thread java8964 java8964
I am trying to do the same thing, as in our project, we want to load the data 
from Cassandra into Hadoop cluster, and SSTable is one obvious option, as you 
can get the changed data since last batch loading directly from the SSTable 
incremental backup files.
But, based on so far my research (I maybe wrong, as I just did limited research 
about the SSTable, I hope someone in this forum can tell me that I am wrong), 
it maybe is NOT a good option:
1) sstable2json looks like NOT a scalable solution to get the data out from the 
Cassandra, and it needs the access to data directory to get some meta data 
from system keyspace for the column family data dumped, which maybe is not an 
option in your MR environment.2) So far I am thinking reuse the same API as 
being used in the sstable2json, but I have to provide these metadata in the 
API, like validator types/partitioner etc. I am surprised that as a backup, the 
column family SSTable dump files DOESN't contain these information by itself. 
Shouldn't it find out this from the SSTable files(ONLY) by itself?3) The big 
trouble comes this if you want to parse the SSTables in  your MR code. The API 
internal will load the Index/Compression_Info information from the 
Index/Compression files, which it assumes located in the same place  as the 
data file, but it will use the FileSteam internal. So if these data files are 
in the DFS (Distributed File System), so far, I didn't find a way to tell the 
API to use the stream from the DFS, instead of Local File Input stream. So 
basically you have 2 options: a) Copy these files from DSF to local file system 
(Same as what Knewton guys did at https://github.com/Knewton/KassandraMRHelper) 
b) Develop your own API to access the SStable files directly ( My guess is that 
Netflix guys probably did this way. They have a project called Aegisthus (See 
here: 
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html), 
but it is not open source.4) About the performance, I am not sure, as 
SSTable2Json underline is using the same Cassandra API, but running in MR give 
us some support in scalability, as we can reuse the Hadoop framework for a lot 
of benefits it can bring.
Yong

 From: dean.hil...@nrel.gov
 To: user@cassandra.apache.org
 Date: Fri, 30 Aug 2013 07:25:09 -0600
 Subject: map/reduce performance time and sstable readerŠ.
 
 Has anyone done performance tests on sstable reading vs. M/R?  I did a quick 
 test on reading all SSTAbles in a LCS column family on 23 tables and took the 
 average time it took sstable2json(to /dev/null to make it faster) which was 7 
 seconds per table.  (reading to stdout took 16 seconds per table).  This then 
 worked out to an estimation of 12.5 hours up to 27 hours(from to stdout 
 calculation).  I am suspecting the map/reduce time may be much worse since 
 there are not as many repeated rows in LCS
 
 Ie. I am wondering if I should just read from SSTAbles directly instead of 
 map/reduce?   I am about to dig around in the code of M/R and sstable2json to 
 see what each is doing specifically.
 
 Thanks,
 Dean
  

Re: Cassandra cluster migration in Amazon EC2

2013-09-03 Thread Robert Coli
On Mon, Sep 2, 2013 at 4:21 PM, Renat Gilfanov gren...@mail.ru wrote:

 - Group 3 of storages into raid0 array, move data directory to the raid0,
 and commit log - to the 4th left storage.
  - As far as I understand, separation of commit log and data directory
 should make performance better - but what about separation the OS from
 those two  - is it worth doing?


Nope. Best practice for amazon is ephemeral disks, and RAID0 for data +
commit log.


  - What are the steps to perform such migration? Will it be possible to
 perform it without downtime, restarting node by node with new configuration
 applied?
  I'm especially worried about IP changes, when we'll uprade the instance
 type. What's the recomended way to handle those IP changes?


Just set auto_bootstrap:false in cassandra.yaml to change the IP address of
a node to which you have copied all the data its token had before its IP
address changed and therefore does not need to be bootstrapped.

=Rob


Re: Versioning in cassandra

2013-09-03 Thread dawood abdullah
I have tried with both the options creating secondary index and also tried
adding parentid to primary key, but I am getting all the files with
parentid 'yyy', what I want is the latest version of file with the
combination of parentid, fileid. Say below are the records inserted in the
file table:

insert into file (id, parentid, version, contenttype, description, name)
values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, description, name)
values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

I want to write a query which returns me second and last record and not the
first and third record, because for the first and third record there exists
a latest version, for the combination of id and parentid.

I am confused If at all this is achievable, please suggest.

Dawood



On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement I
 forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided solves
 the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest version
 number to be, do you see any problems if I keep version as counter /
 timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text, version
 int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should be
 incremented. And when I retrieve a row with fname it should return me the
 latest version row.
  
   Is there a better way to do in Cassandra? Please suggest what
 approach needs to be taken.
 
  Can you explain more about your use case?
 
  If the version need not be a small number, but could be a timestamp,
 you could make use of C*'s ordering feature , have the database set the new
 version as a timestamp and retrieve the latest one with a simple LIMIT 1
 query. (I'll explain more when this is an option for you).
 
  Jan
 
  P.S. Me being a REST/HTTP head, an alarm rings when I see 'version'
 next to 'mimetype' :-) What exactly are you versioning here? Maybe we can
 even change the situation from a functional POV?
 
 
  
   Regards,
  
   Dawood
  
  
  
  
 
 







Re: read ?

2013-09-03 Thread Langston, Jim
Thanks Chris,

I have about 8 heap dumps that I have been looking at. I have been trying to 
isolate
as to why I have be dumping heap, I've started by removing the apps that write 
to
cassandra and eliminating work that would entail. I am left with just the apps 
that
are reading the data and from the heap dumps it looks like Cassandra Column 
methods
being called, because there are so many objects, it is difficult to ascertain 
exactly what
the problem may be. That prompted my query, trying to quickly determine if 
Cassandra
holds objects that have been used for reading, and if so, why, and more 
importantly if
something can be done.

Jim

From: Lohfink, Chris chris.lohf...@digi.commailto:chris.lohf...@digi.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 3 Sep 2013 11:12:19 -0500
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: read ?

To get an accurate picture you should force a full GC on each node, the heap 
utilization can be misleading since there can be a lot of things in the heap 
with no strong references.

There is a number of factors that can lead to this.  For a true comparison I 
would recommend using jconsole and call dumpHeap on 
com.sun.management:type=HotSpotDiagnostic with the 2nd param true (force GC).  
Then open the heap dump up in a tool like yourkit and you will get a better 
comparison and also it will tell you what it is that’s taking the space.

Chris

From: Langston, Jim [mailto:jim.langs...@compuware.com]
Sent: Tuesday, September 03, 2013 8:20 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: read ?

Hi all,

Quick question

I currently am looking at a 4 node cluster and I have currently stopped all 
writing to
Cassandra,  with the reads continuing. I'm trying to understand the utilization
of memory within the JVM. nodetool info on each of the nodes shows them all
growing in footprint, 2 of the three at a greater rate. On the restart of 
Cassandra
each were at about 100MB, after 2 days, each of the following are at:

Heap Memory (MB) : 798.41 / 3052.00

Heap Memory (MB) : 370.44 / 3052.00

Heap Memory (MB) : 549.73 / 3052.00

Heap Memory (MB) : 481.89 / 3052.00

Ring configuration:

Address RackStatus State   LoadOwns
Token
   
127605887595351923798765477786913079296
x 1d  Up Normal  4.38 GB 25.00%  0
x   1d  Up Normal  4.17 GB 25.00%  
42535295865117307932921825928971026432
x   1d  Up Normal  4.19 GB 25.00%  
85070591730234615865843651857942052864
x   1d  Up Normal  4.14 GB 25.00%  
127605887595351923798765477786913079296


What I'm not sure of is what the growth is different between each ? and why
that growth is being created by activity that is read only.

Is Cassandra caching and holding the read data ?

I currently have caching turned off for the key/row. Also as part of the info 
command

Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 14400 save period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 0 save period in seconds



Thanks,

Jim


Re: map/reduce performance time and sstable readerŠ.

2013-09-03 Thread Hiller, Dean
We are considering creating our own InputFormat for hadoop and running the 
tasktrackers on every 3rd node(ie. RF=3) such that we cover all ranges.  Our 
M/R overhead appears to be 13 days vs. 12.5 hours on just reading SSTAbles 
directly on our current data set.

I personally don't think parsing SSTables(using the hadoop M/R framework) is a 
big deal from us since we run task trackers on the cassandra nodes we need it 
on.  Ie. We don't need to copy to DFS to do this I believe(at least not in our 
situation).

I already wrote a client on the SSTableReader parsing out sstables to take a 
look at some of our data while our 13 day M/R job is running(we are 4 days in 
already with no failures and no performance degradation).

later,
Dean

From: java8964 java8964 java8...@hotmail.commailto:java8...@hotmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, September 3, 2013 12:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: map/reduce performance time and sstable readerŠ.

I am trying to do the same thing, as in our project, we want to load the data 
from Cassandra into Hadoop cluster, and SSTable is one obvious option, as you 
can get the changed data since last batch loading directly from the SSTable 
incremental backup files.

But, based on so far my research (I maybe wrong, as I just did limited research 
about the SSTable, I hope someone in this forum can tell me that I am wrong), 
it maybe is NOT a good option:

1) sstable2json looks like NOT a scalable solution to get the data out from the 
Cassandra, and it needs the access to data directory to get some meta data 
from system keyspace for the column family data dumped, which maybe is not an 
option in your MR environment.
2) So far I am thinking reuse the same API as being used in the sstable2json, 
but I have to provide these metadata in the API, like validator 
types/partitioner etc. I am surprised that as a backup, the column family 
SSTable dump files DOESN't contain these information by itself. Shouldn't it 
find out this from the SSTable files(ONLY) by itself?
3) The big trouble comes this if you want to parse the SSTables in  your MR 
code. The API internal will load the Index/Compression_Info information from 
the Index/Compression files, which it assumes located in the same place  as the 
data file, but it will use the FileSteam internal. So if these data files are 
in the DFS (Distributed File System), so far, I didn't find a way to tell the 
API to use the stream from the DFS, instead of Local File Input stream. So 
basically you have 2 options: a) Copy these files from DSF to local file system 
(Same as what Knewton guys did at https://github.com/Knewton/KassandraMRHelper) 
b) Develop your own API to access the SStable files directly ( My guess is that 
Netflix guys probably did this way. They have a project called 
Aegisthushttp://en.wikipedia.org/wiki/Cassandra#History (See here: 
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html), 
but it is not open source.
4) About the performance, I am not sure, as SSTable2Json underline is using the 
same Cassandra API, but running in MR give us some support in scalability, as 
we can reuse the Hadoop framework for a lot of benefits it can bring.

Yong

 From: dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Fri, 30 Aug 2013 07:25:09 -0600
 Subject: map/reduce performance time and sstable readerŠ.

 Has anyone done performance tests on sstable reading vs. M/R? I did a quick 
 test on reading all SSTAbles in a LCS column family on 23 tables and took the 
 average time it took sstable2json(to /dev/null to make it faster) which was 7 
 seconds per table. (reading to stdout took 16 seconds per table). This then 
 worked out to an estimation of 12.5 hours up to 27 hours(from to stdout 
 calculation). I am suspecting the map/reduce time may be much worse since 
 there are not as many repeated rows in LCS

 Ie. I am wondering if I should just read from SSTAbles directly instead of 
 map/reduce? I am about to dig around in the code of M/R and sstable2json to 
 see what each is doing specifically.

 Thanks,
 Dean


Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
try the following. -ml

-- put this in file and run using 'cqlsh -f file

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE file (
parentid text, -- row_key, same for each version
id text, -- column_key, same for each version
contenttype maptimestamp, text, -- differs by version, version is the
key to the map
PRIMARY KEY (parentid, id)
);

update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
parentid = 'd1' and id = 'f2';
update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
parentid = 'd1' and id = 'f2';

select * from file where parentid = 'd1';

-- returns:

-- parentid | id | contenttype
++--
--   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
00:00:00-0500': 'pdf2'}
--   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
00:00:00-0500': 'pdf4'}

-- use an app to pop off the latest version from the map

-- map other varying fields using the same technique as used for contenttype



On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement
 I forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided
 solves the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as 
 counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah 

Re: Versioning in cassandra

2013-09-03 Thread Vivek Mishra
create table file(id text , parentid text,contenttype text,version
timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
ORDER BY (version DESC);

insert into file (id, parentid, version, contenttype, descr, name) values
('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
insert into file (id, parentid, version, contenttype, descr, name) values
('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
create index on file(parentid);


select * from file where id='f1' and parentid='d1' limit 1;

select * from file where parentid='d1' limit 1;


Will it work for you?

-Vivek




On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description, name)
 values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 Jan,

 The solution you gave works spot on, but there is one more requirement
 I forgot to mention. Following is my table structure

 CREATE TABLE file (
   id text,
   contenttype text,
   createdby text,
   createdtime timestamp,
   description text,
   name text,
   parentid text,
   version timestamp,
   PRIMARY KEY (id, version)

 ) WITH CLUSTERING ORDER BY (version DESC);


 The query (select * from file where id = 'xxx' limit 1;) provided
 solves the problem of finding the latest version file. But I have one more
 requirement of finding all the latest version files having parentid say
 'yyy'.

 Please suggest how can this query be achieved.

 Dawood



 On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 In my case version can be timestamp as well. What do you suggest
 version number to be, do you see any problems if I keep version as counter
 / timestamp ?


 On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:


 On 02.09.2013, at 20:44, dawood abdullah muhammed.daw...@gmail.com
 wrote:

  Requirement is like I have a column family say File
 
  create table file(id text primary key, fname text, version int,
 mimetype text, content text);
 
  Say, I have few records inserted, when I modify an existing record
 (content is modified) a new version needs to be created. As I need to 
 have
 provision to revert to back any old version whenever required.
 

 So, can version be a timestamp? Or does it need to be an integer?

 In the former case, make use of C*'s ordering like so:

 CREATE TABLE file (
file_id text,
version timestamp,
fname text,

PRIMARY KEY (file_id,version)
 ) WITH CLUSTERING ORDER BY (version DESC);

 Get the latest file version with

 select * from file where file_id = 'xxx' limit 1;

 If it has to be an integer, use counter columns.

 Jan


  Regards,
  Dawood
 
 
  On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Hi Dawood,
 
  On 02.09.2013, at 16:36, dawood abdullah muhammed.daw...@gmail.com
 wrote:
 
   Hi
   I have a requirement of versioning to be done in Cassandra.
  
   Following is my column family definition
  
   create table file_details(id text primary key, fname text,
 version int, mimetype text);
  
   I have a secondary index created on fname column.
  
   Whenever I do an insert for the same 'fname', the version should
 be incremented. And when I retrieve a row with fname it should return me
 the latest version 

Re: [RELEASE] Apache Cassandra 2.0 released

2013-09-03 Thread Jeremiah D Jordan
Thanks for everyone's work on this release!

-Jeremiah

On Sep 3, 2013, at 8:48 AM, Sylvain Lebresne sylv...@datastax.com wrote:

 The Cassandra team is very pleased to announce the release of Apache Cassandra
 version 2.0.0. Cassandra 2.0.0 is a new major release that adds numerous
 improvements[1,2], including:
   - Lightweight transactions[4] that offers linearizable consistency.
   - Experimental Triggers Support[5].
   - Numerous enhancements to CQL as well as a new and better version of the
 native protocol[6].
   - Compaction improvements[7] (including a hybrid strategy that combines 
 leveled and size-tiered compaction).
   - A new faster Thrift Server implementation based on LMAX Disruptor[8].
   - Eager retries: avoids query timeout by sending data requests to other
 replicas if too much time passes on the original request.
 
 See the full changelog[1] for more and please make sure to check the release
 notes[2] for upgrading details.
 
 Both source and binary distributions of Cassandra 2.0.0 can be downloaded at:
 
  http://cassandra.apache.org/download/
 
 As usual, a debian package is available from the project APT repository[3]
 (you will need to use the 20x series).
 
 The Cassandra team
 
 [1]: http://goo.gl/zU4sWv (CHANGES.txt)
 [2]: http://goo.gl/MrR6Qn (NEWS.txt)
 [3]: http://wiki.apache.org/cassandra/DebianPackaging
 [4]: 
 http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
 [5]: 
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
 [6]: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
 [7]: https://issues.apache.org/jira/browse/CASSANDRA-5371
 [8]: https://issues.apache.org/jira/browse/CASSANDRA-5582
 



How to fix host ID collision?

2013-09-03 Thread Renat Gilfanov

Hello,

We have Cassandra cluster with 5 nodes hosted in the Amazon EC2, and  I had to 
restart two of them, so their IPs changed.
We use NetworkTopologyStrategy, so I simply updated IPs in the 
cassandra-topology.properties file.

However, as I understood, old IPs remained somewhere in the system keyspace, 
and now I observe several different exception stacktraces in the log files, 
including:

java.lang.RuntimeException: Host ID collision between active endpoint /new IP 
and /old IP (id=ab66dd02-96b2-4504-8403-7d066f911698)
    at 
org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:229)
    at 
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1358)
    at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1228)
    at 
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1960)
    at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:837)
    at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:915)
    at 
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
    at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

and

java.lang.AssertionError: Missing host ID for old IP
    at 
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:583)
    at 
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:552)
    at 
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1658)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)


nodetool status being executed on 3 old nodes, shows old ghost node:

Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
    Rack
UN  10.14.128.109  2.8 GB 141 4.1%   
32260392-12c2-4f1a-812e-87fd9a960d10  RAC2
UN  10.24.33.187   2.12 GB    258 42.7%  
ab66dd02-96b2-4504-8403-7d066f911698  RAC3
UN  10.20.149.165  2.99 GB    251 4.5%   
a0792f59-20b1-4017-a7f6-88e0c0d7f86f  RAC1
DN  10.11.73.104   1.07 GB    2   1.0%   null   
   RAC1
Datacenter: DC2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
    Rack
UN  10.34.78.23 2.21 GB    117 0.9%   
2acd3766-404d-4cdc-b3e3-7b3b95856f0e  RAC1
UN  10.20.23.171   2.22 GB    255 46.8%  
67421e3a-1dfc-48a0-88b3-c6dbd64dc9d8  RAC1


Is it possible to fix those IP collisions ?


Thanks.

RE: Update-Replace

2013-09-03 Thread Baskar Duraikannu
I have a similar use case but only need to update portion of the row. We 
basically perform single write (with old and new columns) with very low value 
of ttl for old columns. 

 From: jan.algermis...@nordsc.com
 Subject: Update-Replace
 Date: Fri, 30 Aug 2013 17:35:48 +0200
 To: user@cassandra.apache.org
 
 Hi,
 
 I have a use case, where I periodically need to apply updates to a wide row 
 that should replace the whole row.
 
 The straight-forward insert/update only replace values that are present in 
 the executed statement, keeping remaining data around.
 
 Is there a smooth way to do a replace with C* or do I have to handle this by 
 the application (e.g. doing delete and then write or coming up with a more 
 clever data model)?
 
 Jan
  

RE: Listblob retrieve performance

2013-09-03 Thread Baskar Duraikannu
I don't know of any. I would check the size of LIST. If it is taking long, it 
could be just that disk read is taking long.  

Date: Sat, 31 Aug 2013 16:35:22 -0300
Subject: Listblob retrieve performance
From: savio.te...@lupa.inf.ufg.br
To: user@cassandra.apache.org

I have a column family with this conf:

CREATE TABLE geoms (
  geom_key text PRIMARY KEY,
  part_geom listblob,
  the_geom text
) WITH
  bloom_filter_fp_chance=0.01 AND

  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND

  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};



I run this query select geom_key, the_geom,  part_geom from geoms limit 1; in 
700ms.

When I run the same query without part_geom attr (select geom_key, the_geom 
from geoms limit 1;), the query runs in 5 ms. 


Is there a performance problem with a Listblob attribute?

Thanks in advance


-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles



Mestrando em Ciências da Computação - UFG 
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
  

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
I use the technique described in my previous message to handle millions of
messages and their versions.

Actually, I use timeuuid's instead of timestamps, as they have more
'uniqueness'. Also I index my maps by a timeuuid that is the complement
(based on a future date) of a current timeuuid. Since maps are kept sorted
by key, this means I can just pop off the first one to get the most recent.

The downside of this approach is that you get more stuff returned to you
from Cassandra than you need. To mitigate that I queue a job to examine and
correct the situation if, upon doing a read, the number of versions for a
particular key is higher than some threshold, e.g. 50. There are many ways
to approach this problem.

Our actual implementation proceeds to another level, as we also have
replicas of versions. This happens because we process important
transactions in parallel and can expect up to 9 replicas of each version.
We journal them all and use them for reporting latencies in our processing
pipelines as well as for replay when we need to recover application state.

Regards,

Michael


On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael michael.la...@nytimes.comwrote:

 try the following. -ml

 -- put this in file and run using 'cqlsh -f file

 DROP KEYSPACE latest;

 CREATE KEYSPACE latest WITH replication = {
 'class': 'SimpleStrategy',
 'replication_factor' : 1
 };

 USE latest;

 CREATE TABLE file (
 parentid text, -- row_key, same for each version
 id text, -- column_key, same for each version
 contenttype maptimestamp, text, -- differs by version, version is
 the key to the map
 PRIMARY KEY (parentid, id)
 );

 update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
 parentid = 'd1' and id = 'f1';
 update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
 parentid = 'd1' and id = 'f2';
 update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
 parentid = 'd1' and id = 'f2';

 select * from file where parentid = 'd1';

 -- returns:

 -- parentid | id | contenttype

 ++--
 --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
 00:00:00-0500': 'pdf2'}
 --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
 00:00:00-0500': 'pdf4'}

 -- use an app to pop off the latest version from the map

 -- map other varying fields using the same technique as used for
 contenttype



 On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create table file(id text , parentid text,contenttype text,version
 timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
 ORDER BY (version DESC);

 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, descr, name) values
 ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 create index on file(parentid);


 select * from file where id='f1' and parentid='d1' limit 1;

 select * from file where parentid='d1' limit 1;


 Will it work for you?

 -Vivek




 On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 My bad. I did miss out to read latest version part.

 -Vivek


 On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah 
 muhammed.daw...@gmail.com wrote:

 I have tried with both the options creating secondary index and also
 tried adding parentid to primary key, but I am getting all the files with
 parentid 'yyy', what I want is the latest version of file with the
 combination of parentid, fileid. Say below are the records inserted in the
 file table:

 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
 insert into file (id, parentid, version, contenttype, description,
 name) values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');

 I want to write a query which returns me second and last record and not
 the first and third record, because for the first and third record there
 exists a latest version, for the combination of id and parentid.

 I am confused If at all this is achievable, please suggest.

 Dawood



 On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 

Re: Update-Replace

2013-09-03 Thread Jan Algermissen
Baskar,

On 03.09.2013, at 23:11, Baskar Duraikannu baskar.duraika...@outlook.com 
wrote:

 I have a similar use case but only need to update portion of the row. We 
 basically perform single write (with old and new columns) with very low value 
 of ttl for old columns. 

I found out that using bound statements with java-driver works quite well for 
this case because the fields with a ? in the prepared statement but without a 
bound value will be automatically set to null - hence removed.

So this actually automagically does what you/I want.

See 
https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/APfnKNTXuvE/gBeCk37jgRAJ

Jan

 
  From: jan.algermis...@nordsc.com
  Subject: Update-Replace
  Date: Fri, 30 Aug 2013 17:35:48 +0200
  To: user@cassandra.apache.org
  
  Hi,
  
  I have a use case, where I periodically need to apply updates to a wide row 
  that should replace the whole row.
  
  The straight-forward insert/update only replace values that are present in 
  the executed statement, keeping remaining data around.
  
  Is there a smooth way to do a replace with C* or do I have to handle this 
  by the application (e.g. doing delete and then write or coming up with a 
  more clever data model)?
  
  Jan



cqlsh error after enabling encryption

2013-09-03 Thread David Laube
Hi All,

After enabling encryption on our Cassandra 1.2.8 nodes, we receiving the error 
Connection error: TSocket read 0 bytes while attempting to use CQLsh to talk 
to the ring. I've followed the docs over at 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/security/secureCqlshSSL_t.html
 but can't seem to figure out why this isn't working. Inter-node communication 
seems to be working properly since nodetool status shows our nodes as up, but 
the CQLsh client is unable to talk to a single node or any node in the cluster 
(specifying the IP in .cqlshrc or on the CLI) for some reason. I'm providing 
the applicable config file entries below for review. Any insight or suggestions 
would be greatly appreciated! :)



My ~/.cqlshrc file:


[connection]
hostname = 127.0.0.1
port = 9160
factory = cqlshlib.ssl.ssl_transport_factory

[ssl]
certfile = /etc/cassandra/conf/cassandra_client.crt
validate = true ## Optional, true by default.

[certfiles] ## Optional section, overrides the default certfile in the [ssl] 
section.
192.168.1.3 = ~/keys/cassandra01.cert
192.168.1.4 = ~/keys/cassandra02.cert




Our cassandra.yaml file config blocks:

…snip…

server_encryption_options:
internode_encryption: all
keystore: /etc/cassandra/conf/.keystore
keystore_password: yeah-right
truststore: /etc/cassandra/conf/.truststore
truststore_password: yeah-right
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]
# require_client_auth: false

# enable or disable client/server encryption.
client_encryption_options:
enabled: true
keystore: /etc/cassandra/conf/.keystore
keystore_password: yeah-right
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
protocol: TLS
algorithm: SunX509
store_type: JKS
cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]

…snip...





Thanks,
-David Laube



Re[2]: How to fix host ID collision?

2013-09-03 Thread Renat Gilfanov
 Thanks a lot for the quick reply, 

Should I run the nodetool repair on all nodes before or after that? 
Also, it's mentioned in the documentation that auto_bootstrap setting is 
applied only to non-seed nodes. Currently I specified all nodes as seeds, 
should I remove nodes with new IP from seeds then?


Вторник,  3 сентября 2013, 14:08 -07:00 от Robert Coli rc...@eventbrite.com:
On Tue, Sep 3, 2013 at 2:01 PM, Renat Gilfanov   gren...@mail.ru  wrote:

We have Cassandra cluster with 5 nodes hosted in the Amazon EC2, and  I had 
to restart two of them, so their IPs changed.
We use NetworkTopologyStrategy, so I simply updated IPs in the 
cassandra-topology.properties file.

Set auto_bootstrap:false in the conf file and restart the node to change IP 
address for a node.

=Rob



Re: Listblob retrieve performance

2013-09-03 Thread Sávio Teles
The list is null.


2013/9/3 Baskar Duraikannu baskar.duraika...@outlook.com

 I don't know of any. I would check the size of LIST. If it is taking long,
 it could be just that disk read is taking long.

 --
 Date: Sat, 31 Aug 2013 16:35:22 -0300
 Subject: Listblob retrieve performance
 From: savio.te...@lupa.inf.ufg.br
 To: user@cassandra.apache.org


 I have a column family with this conf:

 CREATE TABLE geoms (
   geom_key text PRIMARY KEY,
   part_geom listblob,
   the_geom text
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};


 I run this query *select geom_key, the_geom, part_geom from geoms limit
 1;* in 700ms.

 When I run the same query without part_geom attr (*select geom_key,
 the_geom from geoms limit 1;)*, the query runs in 5 ms.

 *Is there a performance problem with a Listblob attribute?

 *
 *Thanks in advance
 *

 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
  Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Fwd: {kundera-discuss} Kundera 2.7 released

2013-09-03 Thread Vivek Mishra
fyip.

-- Forwarded message --
From: Vivek Mishra vivek.mis...@impetus.co.in
Date: Wed, Sep 4, 2013 at 6:15 AM
Subject: {kundera-discuss} Kundera 2.7 released
To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com


Hi All,

We are happy to announce the release of Kundera 2.7 .

Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL
datastores. The idea behind Kundera is to make working with NoSQL databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB,
Redis, OracleNoSQL, Neo4j,ElasticSearch and relational databases.


Major Changes:

1) Support for pagination over Mongodb.
2) Added elastic search as datastore and fallback indexing mechanism.

Github Bug Fixes:

https://github.com/impetus-opensource/Kundera/issues/234
https://github.com/impetus-opensource/Kundera/issues/215
https://github.com/impetus-opensource/Kundera/issues/201
https://github.com/impetus-opensource/Kundera/issues/333
https://github.com/impetus-opensource/Kundera/issues/362
https://github.com/impetus-opensource/Kundera/issues/350
https://github.com/impetus-opensource/Kundera/issues/365

How to Download:
To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.7 Kundera maven libraries are now
available at:
https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests

Survey/Feedback:
http://www.surveymonkey.com/s/BMB9PWG

Thank you all for your contributions and using Kundera!


Sincerely,
Kundera Team








NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

--
You received this message because you are subscribed to the Google Groups
kundera-discuss group.
To unsubscribe from this group and stop receiving emails from it, send an
email to kundera-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.