Re: Which hector version is suitable for cassandra 2.0.6 ?

2014-03-28 Thread ssiv...@gmail.com

Hello,

DS JD

On 03/27/2014 01:06 PM, DE VITO Dominique wrote:

Hi,


-Message d'origine-
De : ssiv...@gmail.com [mailto:ssiv...@gmail.com]
Envoyé : jeudi 27 mars 2014 10:41
À : user@cassandra.apache.org
Objet : Re: Which hector version is suitable for cassandra 2.0.6 ?

On 03/27/2014 12:23 PM, user 01 wrote:

Btw both Hector & Datastax java driver are maintained by Datastax, &
both for java, this speaks for itself !

I'm not sure about the first statement. What do you mean at the second part of 
the sentence?
They are Java-based, but has different API (and I find DS and Astyanax API 
quite more convenient that the Hector's one). They are also can be fine-grained 
configured.
Astyanax performance is about the same to DS (astx now about 5-10% faster), what I 
cannot say about Hector. And the main difference is that the DS supports C* v2 
with cql3, prepared statements, its navite  binary > protocol and other 
features. For example, using Astyanax versus DS on C* v2 shows unstable results 
under the high-load.

Which one shows unstable results under the high-load ? Astyanax ? DS ?

Thanks.

Dominique


Also, CQL is now deprecated and in the future the move is towards CQL3 and thus 
the DataStax driver recommended for future development.
Astyanax working on the facade API so, I guess, it will be possible to change 
the raw level driver in some cases.

--
Thanks,
Serj


--
Thanks,
Serj



RE: Question about rpms from datastax

2014-03-28 Thread Romain HARDOUIN
cassandra*.noarch.rpm -> Install Cassandra Only
dsc*.noarch.rpm -> DSC stands for DataStax Community. Install Cassandra + 
OpsCenter

Donald Smith  a écrit sur 27/03/2014 
20:36:57 :

> De : Donald Smith 
> A : "'user@cassandra.apache.org'" , 
> Date : 27/03/2014 20:37
> Objet : Question about rpms from datastax
> 
> On http://rpm.riptano.com/community/noarch/ what’s the difference 
between 
> 
> cassandra20-2.0.6-1.noarch.rpm  and  dsc20-2.0.6-1.noarch.rpm ?
> 
> Thanks, Don
> 
> Donald A. Smith | Senior Software Engineer 
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> dona...@audiencescience.com


Re: Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Michael Shuler

Caveat: I am not super strong on rpm-based distros.

On 03/27/2014 06:57 PM, Jon Forrest wrote:

I've done a little more research on this problem.
I'm now convinced that this is a Cassandra problem,
but not the problem I had originally thought.

For example, I downloaded cassandra12-1.2.15-1.noarch.rpm
and I then ran the following with the results shown:

# rpm -iv cassandra12-1.2.15-1.noarch.rpm
error: Failed dependencies:
 java >= 1.6.0 is needed by cassandra12-1.2.15-1.noarch


This properly indicates the missing dependency, which is not installed, 
nor provided by a package in your 'rpm -i ...' command.   This is a 
*binary* dependency: the binary 'java' is installed by a package that is 
in the rpm database. (rpm knows nothing about your tar install..)



This clearly shows that Cassandra is dependent on java, but
notice that it says nothing about openjdk. However, if I now
run

# yum install java


There is no package in the yum repository *named* 'java'.


I see (edited)

Resolving Dependencies
--> Running transaction check
---> Package java-1.7.0-openjdk.x86_64 1:1.7.0.51-2.4.4.1.el6_5 will be
installed

This is the crux of the problem. By making Cassandra dependent on
"java", instead of "jdk" or "jre", then installing Cassandra will
always require openjdk.


The best (for some value of best) package in the give yum repositories 
that satisfies the requirement for the 'java' *binary* is the openjdk 
package.



One way of proving this is what I see when I run

# yum install jdk

which is

Package 2000:jdk-1.7.0_51-fcs.x86_64 already installed and latest version


Is this from some yum repo? I guess this is your custom oracle jdk 
package from some custom location?



That's what we want the Cassandra rpm to see. So, I'm going to try
repacking the Cassandra rpm to depend on jdk. If all goes well, this
will work, or at least let the installation get farther.


No, C* does not depend on a JDK - it does depend on a JRE, which 
provides a /usr/bin/java binary.


So, yes.. Oracle has screwed up making use of their 
once-freely-redistributable JRE and distirbutions have no choice but to 
only provide OpenJDK packages.  This is not really a huge issue, as C* 
*does* run fine on OpenJDK. It is not suggested for production, but it 
does run, so satisfying the dependency via OpenJDK is 100% correct - 
that's the only option in the yum repositories.


I don't know yum/rpm as well as apt/dpkg, but this is a solvable 
problem, most times, by setting up your own yum/apt repository, if you 
want to provide your system with custom, non-distribution-provided packages.


I could be convinced that I'm wrong and there could be improvements made 
for yum/rpm to work better, but a binary dependency of 'java >= 
$somever' is about a simple as it gets - how that is met for yum is the 
problem to solve, and oracle made that a "figure it out yourself" 
problem for sysadmins.


--
Kind regards,
Michael


Re: Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Jon Forrest



On 3/28/2014 8:20 AM, Michael Shuler wrote:


# rpm -iv cassandra12-1.2.15-1.noarch.rpm
error: Failed dependencies:
 java >= 1.6.0 is needed by cassandra12-1.2.15-1.noarch


This properly indicates the missing dependency, which is not installed,
nor provided by a package in your 'rpm -i ...' command.   This is a
*binary* dependency: the binary 'java' is installed by a package that is
in the rpm database. (rpm knows nothing about your tar install..)


I don't think so. I believe this is saying that the 'java' package
is missing, not the 'java' binary.


One way of proving this is what I see when I run

# yum install jdk

which is

Package 2000:jdk-1.7.0_51-fcs.x86_64 already installed and latest version


Is this from some yum repo? I guess this is your custom oracle jdk
package from some custom location?


This is the Oracle JDK rpm.


That's what we want the Cassandra rpm to see. So, I'm going to try
repacking the Cassandra rpm to depend on jdk. If all goes well, this
will work, or at least let the installation get farther.


No, C* does not depend on a JDK - it does depend on a JRE, which
provides a /usr/bin/java binary.


I respectfully disagree.


So, yes.. Oracle has screwed up making use of their
once-freely-redistributable JRE and distirbutions have no choice but to
only provide OpenJDK packages.  This is not really a huge issue, as C*
*does* run fine on OpenJDK.


I'm not doubting what you say, but why would the DataStax
website say to use Oracle Java?


I don't know yum/rpm as well as apt/dpkg, but this is a solvable
problem, most times, by setting up your own yum/apt repository, if you
want to provide your system with custom, non-distribution-provided
packages.


True, but this is a good idea for reasons having nothing to do
with the Cassandra issue.


I could be convinced that I'm wrong and there could be improvements made
for yum/rpm to work better, but a binary dependency of 'java >=
$somever' is about a simple as it gets - how that is met for yum is the
problem to solve, and oracle made that a "figure it out yourself"
problem for sysadmins.


I think I've solved this problem. I'm going to post another message
in which I describe what I've done.

Jon Forrest


The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: (SOLVED) Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Tyler Hobbs
On Fri, Mar 28, 2014 at 11:48 AM, Colin  wrote:

> OpenJDK will crash under load whilst running Cassandra.


That's definitely the case for OpenJDK 6, but 7 *should* be okay.  However,
most people are running the Oracle JRE (even for 7), so there's not a ton
of evidence out there for OpenJDK 7.


-- 
Tyler Hobbs
DataStax 


How to tear down an EmbeddedCassandraService in unit tests?

2014-03-28 Thread Clint Kelly
All,

I have a question about how to use the EmbeddedCassandraService in unit
tests.  I wrote a short collection of unit tests here:

https://github.com/wibiclint/cassandra-java-driver-keyspaces

I'm trying to start up a new EmbeddedCassandraService for each unit test.
I looked at the Cassandra source code to try to see how that happens there
and replicated it as well as I could here.  My first unit test works great,
but in subsequent unit tests I get this error:

java.lang.RuntimeException: java.io.FileNotFoundException:
target/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-2-Data.db
(No such file or directory)

I assume that this is because I am not shutting down the
EmbeddedCassandraService in the first unit test correctly (I do not have
any @After method).

Does anyone have any advice on how to clean up the EmbeddedCassandraService
between unit tests?  I can instead create the EmbeddedCassandraService in a
static @BeforeClass method and then have every unit test use a different
keyspace, but that strikes me as somewhat sloppy and I'd rather understand
what I'm doing well enough to be able to have one service per test if
necessary.

Thanks!

Best regards,
Clint


(SOLVED) Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Jon Forrest

In a previous message I described my guess at
what was causing the Datastax Cassandra installation
to require OpenJDK. Using the method I describe below,
I'm now able to install the Datastax Cassandra rpm.
Note that I have no idea (yet) whether Cassandra actually
runs, but at least it installs.

There's a wonderful opensource program out there called
rpmrebuild. It lets you examine and modify the metadata
in an rpm, including the dependencies. So, I ran

rpmrebuild -e -p cassandra12-1.2.15-1.noarch.rpm

This puts me in an editor with the spec file loaded.
I searched for 'java' and found the line

Requires:  java >= 1.6.0

I changed this line to

Requires:  jdk >= 1.6.0

I wrote out the file and exited the editor. This created
/root/rpmbuild/RPMS/noarch/cassandra12-1.2.15-1.noarch.rpm
which I put in a local yum repo. I was then able to install
this using yum and I was able to start Cassandra. Problem solved!

Now I'm on to test whether this installation really works.

Jon Forrest



The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: (SOLVED) Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Colin
OpenJDK will crash under load whilst running Cassandra.  

--
Colin 
+1 320 221 9531

 

> On Mar 28, 2014, at 4:11 PM, Jon Forrest  wrote:
> 
> In a previous message I described my guess at
> what was causing the Datastax Cassandra installation
> to require OpenJDK. Using the method I describe below,
> I'm now able to install the Datastax Cassandra rpm.
> Note that I have no idea (yet) whether Cassandra actually
> runs, but at least it installs.
> 
> There's a wonderful opensource program out there called
> rpmrebuild. It lets you examine and modify the metadata
> in an rpm, including the dependencies. So, I ran
> 
> rpmrebuild -e -p cassandra12-1.2.15-1.noarch.rpm
> 
> This puts me in an editor with the spec file loaded.
> I searched for 'java' and found the line
> 
> Requires:  java >= 1.6.0
> 
> I changed this line to
> 
> Requires:  jdk >= 1.6.0
> 
> I wrote out the file and exited the editor. This created
> /root/rpmbuild/RPMS/noarch/cassandra12-1.2.15-1.noarch.rpm
> which I put in a local yum repo. I was then able to install
> this using yum and I was able to start Cassandra. Problem solved!
> 
> Now I'm on to test whether this installation really works.
> 
> Jon Forrest
> 
> 
> 
> The information transmitted in this email is intended only for the person or 
> entity to which it is addressed, and may contain material confidential to 
> Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
> retransmission, dissemination or other use of, or taking of any action in 
> reliance upon, this information by persons or entities other than the 
> intended recipient(s) is prohibited. If you received this email in error, 
> please contact the sender and delete the material from your files.


DataStax Devcenter CQL3 version vs Cassandra vrs

2014-03-28 Thread Philip G
Within DevCenter, I'm getting an error: [feature] is introduced in CQL
3.1.0, you're running CQL 3.0.5.

This doesn't make sense as I'm using the absolute latest version of
Cassandra (2.0.6); connecting through cqlsh shows it's using 3.1.1:

Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.6 | CQL spec 3.1.1 | Thrift protocol
19.39.0]
Use HELP for help.

What can I do to make DevCenter understand it should be connecting using
3.1.1?

---
Philip
g...@gpcentre.net
http://www.gpcentre.net/


Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are 
running our instances in Amazon Web Services.

What I am trying to do.

Our cassandra systems data is on an EBS volume so we can take snapshots of the 
data and create volumes based on those snapshots and restore them where we want 
to.

The snapshot process 

Step 1
Login to  the cassandra node.

Step 2
Run nodetool clearsnapshot

Step 3
Run nodetool snapshot

Step 4
Take EBS snapshot

The above steps are performed only after the previous command returns.

Restore Process

Step 1
Remove data/system, commit_log and the saved_caches data//* 
(excluding the snapshot directory)

Step 2
Move all snapshot files into their respective KS/CF locations

Step 3
Start Cassandra

Step 4 
Create the schema

Step 5
Look at the log.  This is where I find a corrupted sstable in our keyspace (not 
system).

Trouble shooting

I suspected a race condition so I did the following:

I inserted a sleep for 60 seconds after issuing “nodetool clearsnapshot” 
I inserted a sleep for 60 seconds after issuing “nodetool snapshot”

Took the snapshot
Restored the snapshot as stated above following those same steps.
It worked with no problem at all.

So my assumption is that Cassandra is doing a few more things after the 
“nodetool snapshot” returns.

Now that you know what is going on, I have my question.

How can I tell when a snapshot is fully complete so I do not have corrupted 
SSTables?

I can reproduce this 100% of the time.

Thanks for your help


Re: Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-28 Thread Blair Zajac

On Mar 27, 2014, at 2:16 PM, Michael Dykman  wrote:

> Java on linux has *always* been a hassle. Recently, installing ant via
> apt-get on an active ubuntu still want to yank in components of GCJ
> .  Back to the tar-ball.

For Ubuntu and Debian, I use the webupd8team packages, these download the 
tarball from Oracle and install it:

https://launchpad.net/~webupd8team/+archive/java
http://community.linuxmint.com/tutorial/view/1414

Blair



Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Robert Coli
On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie  wrote:

> We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and
> are running our instances in Amazon Web Services.
>


> Our cassandra systems data is on an EBS volume
>

Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.


> so we can take snapshots of the data and create volumes based on those
> snapshots and restore them where we want to.
>

https://github.com/synack/tablesnap

?


> How can I tell when a snapshot is fully complete so I do not have
> corrupted SSTables?
>

SStables are immutable after they are created. I'm not sure how you're
getting a snapshot that has corrupted SSTables in it. If you can repro
reliably, file a JIRA on issues.apache.org.

=Rob


Re: DataStax Devcenter CQL3 version vs Cassandra vrs

2014-03-28 Thread Alex Popescu
DevCenter 1.0 is the Java driver 1.0 to connect to your cluster. This
version of the driver doesn't support C* 2.0 with its latest CQL version.
Now you'll still be able to connect to a C* 2.0 cluster, but your queries
will actually need to be compatible with C* 1.2 (basically none of the
new CQL features will work).

On the bright side, we're working on updating DevCenter to use the Java
driver 2.0 which supports both C* 1.2 and 2.0 (and their corresponding
CQL versions).



On Fri, Mar 28, 2014 at 10:41 AM, Philip G  wrote:

> Within DevCenter, I'm getting an error: [feature] is introduced in CQL
> 3.1.0, you're running CQL 3.0.5.
>
> This doesn't make sense as I'm using the absolute latest version of
> Cassandra (2.0.6); connecting through cqlsh shows it's using 3.1.1:
>
> Connected to Test Cluster at localhost:9160.
> [cqlsh 4.1.1 | Cassandra 2.0.6 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
>
> What can I do to make DevCenter understand it should be connecting using
> 3.1.1?
>
> ---
> Philip
> g...@gpcentre.net
> http://www.gpcentre.net/
>



-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


Re: How to tear down an EmbeddedCassandraService in unit tests?

2014-03-28 Thread Andre Sprenger
We are using version 1.2.4 and it is difficult to shutdown the embedded
version. But you don't have to. Just check in each test setup method if
embedded Cassandra is already running and start it if necessary. Than
create keyspaces/tables in setup methods and drop them in teardown methods.
For us this is also faster.

A nice alternative is: https://github.com/edwardcapriolo/farsandra

We also use Farsandra and it works pretty well.


2014-03-28 18:10 GMT+01:00 Clint Kelly :

> All,
>
> I have a question about how to use the EmbeddedCassandraService in unit
> tests.  I wrote a short collection of unit tests here:
>
> https://github.com/wibiclint/cassandra-java-driver-keyspaces
>
> I'm trying to start up a new EmbeddedCassandraService for each unit test.
> I looked at the Cassandra source code to try to see how that happens there
> and replicated it as well as I could here.  My first unit test works great,
> but in subsequent unit tests I get this error:
>
> java.lang.RuntimeException: java.io.FileNotFoundException:
> target/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-2-Data.db
> (No such file or directory)
>
> I assume that this is because I am not shutting down the
> EmbeddedCassandraService in the first unit test correctly (I do not have
> any @After method).
>
> Does anyone have any advice on how to clean up the
> EmbeddedCassandraService between unit tests?  I can instead create the
> EmbeddedCassandraService in a static @BeforeClass method and then have
> every unit test use a different keyspace, but that strikes me as somewhat
> sloppy and I'd rather understand what I'm doing well enough to be able to
> have one service per test if necessary.
>
> Thanks!
>
> Best regards,
> Clint
>


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
Thank you for your quick response.

Is there a way to tell when a snapshot is completely done?



On Friday, March 28, 2014 1:30 PM, Robert Coli  wrote:
 
On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie  wrote:

We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are 
running our instances in Amazon Web Services.
 
Our cassandra systems data is on an EBS volume

Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.
 
so we can take snapshots of the data and create volumes based on those 
snapshots and restore them where we want to.

https://github.com/synack/tablesnap



?
 
How can I tell when a snapshot is fully complete so I do not have corrupted 
SSTables?

SStables are immutable after they are created. I'm not sure how you're getting 
a snapshot that has corrupted SSTables in it. If you can repro reliably, file a 
JIRA on issues.apache.org.

=Rob

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
In your step 4, be sure you create a consistent EBS snapshot. You may have
pieces of your sstables that have not actually been flushed all the way to
EBS.

See https://github.com/alestic/ec2-consistent-snapshot

ml


On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie  wrote:

> Thank you for your quick response.
>
> Is there a way to tell when a snapshot is completely done?
>
>
>   On Friday, March 28, 2014 1:30 PM, Robert Coli 
> wrote:
>  On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie wrote:
>
> We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and
> are running our instances in Amazon Web Services.
>
>
>
>  Our cassandra systems data is on an EBS volume
>
>
> Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.
>
>
>  so we can take snapshots of the data and create volumes based on those
> snapshots and restore them where we want to.
>
>
> https://github.com/synack/tablesnap
>
>
> ?
>
>
>  How can I tell when a snapshot is fully complete so I do not have
> corrupted SSTables?
>
>
> SStables are immutable after they are created. I'm not sure how you're
> getting a snapshot that has corrupted SSTables in it. If you can repro
> reliably, file a JIRA on issues.apache.org.
>
> =Rob
>
>
>
>


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Robert Coli
On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie  wrote:

> Thank you for your quick response.
>
> Is there a way to tell when a snapshot is completely done?
>

IIRC, the JMX call blocks until the snapshot completes. It should be done
when nodetool returns.

=Rob


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
Robert,

That is what I thought as well.  But apparently something is happening.  The 
only way I can get away with doing this is adding a sleep 60 right after the 
nodetool snapshot is executed.  I can reproduce this 100% of the time by not 
issuing a sleep after nodetool snapshot.

This is the error.

ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java (line 
191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at 
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
... 11 more



On Friday, March 28, 2014 2:38 PM, Robert Coli  wrote:
 
On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie  wrote:

Thank you for your quick response.
>
>
>Is there a way to tell when a snapshot is completely done?

IIRC, the JMX call blocks until the snapshot completes. It should be done when 
nodetool returns.


=Rob

StatusLogger output help

2014-03-28 Thread Tom van den Berge
Hi,

In my cassandra logs, I see a lot of "StatusLogger" output lines. I'm
trying to understand why this is logged, and how to interpret the output.
Maybe someone can point me to some documentation on this particular logging
aspect?

I would like to know what is triggering the StatusLogger.java to start
logging? Sometimes it logs every few seconds, and sometimes it won't log
for hours.

Also, about the lines that log "Memtable ops, data" per ColumnFamily, what
do these figures mean? Is it number of operations and data size (bytes, MB,
...)? Are the ops counters reset every time they are logged, or e.g. every
x minutes?


Any help is greatly appreciated!
Thanks,
Tom


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I have a nagging memory of reading about issues with virtualization and not
actually having durable versions of your data even after an fsync (within
the VM).  Googling around lead me to this post:
http://petercai.com/virtualization-is-bad-for-database-integrity/

It's possible you're hitting this issue, with with the virtualization
layer, or with EBS itself.  Just a shot in the dark though, other people
would likely know much more than I.



On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie  wrote:

> Robert,
>
> That is what I thought as well.  But apparently something is happening.
>  The only way I can get away with doing this is adding a sleep 60 right
> after the nodetool snapshot is executed.  I can reproduce this 100% of the
> time by not issuing a sleep after nodetool snapshot.
>
> This is the error.
>
> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
> at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
> at
> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>  ... 11 more
>
>
>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
> wrote:
>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>
> Thank you for your quick response.
>
> Is there a way to tell when a snapshot is completely done?
>
>
> IIRC, the JMX call blocks until the snapshot completes. It should be done
> when nodetool returns.
>
> =Rob
>
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
predictable.

Additionally, from a practical standpoint, you may want to back up your
sstables somewhere.  If you use S3, it's easy to pull just the new tables
out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
incur the overhead of routinely backing up the entire dataset.  For a non
trivial database, this matters quite a bit.


On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael
wrote:

> As I tried to say, EBS snapshots require much care or you get corruption
> such as you have encountered.
>
> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
> xfs_freeze? Somehow I doubt it...
>
>
> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
>
>> I have a nagging memory of reading about issues with virtualization and
>> not actually having durable versions of your data even after an fsync
>> (within the VM).  Googling around lead me to this post:
>> http://petercai.com/virtualization-is-bad-for-database-integrity/
>>
>> It's possible you're hitting this issue, with with the virtualization
>> layer, or with EBS itself.  Just a shot in the dark though, other people
>> would likely know much more than I.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie wrote:
>>
>>> Robert,
>>>
>>> That is what I thought as well.  But apparently something is happening.
>>>  The only way I can get away with doing this is adding a sleep 60 right
>>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>>> time by not issuing a sleep after nodetool snapshot.
>>>
>>> This is the error.
>>>
>>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>> java.io.EOFException
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>>  at
>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>>  at
>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Caused by: java.io.EOFException
>>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>>>  ... 11 more
>>>
>>>
>>>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
>>> wrote:
>>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>>>
>>> Thank you for your quick response.
>>>
>>> Is there a way to tell when a snapshot is completely done?
>>>
>>>
>>> IIRC, the JMX call blocks until the snapshot completes. It should be
>>> done when nodetool returns.
>>>
>>> =Rob
>>>
>>>
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> skype: rustyrazorblade
>>
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
As I tried to say, EBS snapshots require much care or you get corruption
such as you have encountered.

Does Cassandra quiesce the file system after a snapshot using fsfreeze or
xfs_freeze? Somehow I doubt it...


On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad  wrote:

> I have a nagging memory of reading about issues with virtualization and
> not actually having durable versions of your data even after an fsync
> (within the VM).  Googling around lead me to this post:
> http://petercai.com/virtualization-is-bad-for-database-integrity/
>
> It's possible you're hitting this issue, with with the virtualization
> layer, or with EBS itself.  Just a shot in the dark though, other people
> would likely know much more than I.
>
>
>
> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie  wrote:
>
>> Robert,
>>
>> That is what I thought as well.  But apparently something is happening.
>>  The only way I can get away with doing this is adding a sleep 60 right
>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>> time by not issuing a sleep after nodetool snapshot.
>>
>> This is the error.
>>
>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.EOFException
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>  at
>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>  at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>>  ... 11 more
>>
>>
>>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
>> wrote:
>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>>
>> Thank you for your quick response.
>>
>> Is there a way to tell when a snapshot is completely done?
>>
>>
>> IIRC, the JMX call blocks until the snapshot completes. It should be done
>> when nodetool returns.
>>
>> =Rob
>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
+1 for tablesnap


On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad  wrote:

> I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
> predictable.
>
> Additionally, from a practical standpoint, you may want to back up your
> sstables somewhere.  If you use S3, it's easy to pull just the new tables
> out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
> incur the overhead of routinely backing up the entire dataset.  For a non
> trivial database, this matters quite a bit.
>
>
> On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael  > wrote:
>
>> As I tried to say, EBS snapshots require much care or you get corruption
>> such as you have encountered.
>>
>> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
>> xfs_freeze? Somehow I doubt it...
>>
>>
>> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
>>
>>> I have a nagging memory of reading about issues with virtualization and
>>> not actually having durable versions of your data even after an fsync
>>> (within the VM).  Googling around lead me to this post:
>>> http://petercai.com/virtualization-is-bad-for-database-integrity/
>>>
>>> It's possible you're hitting this issue, with with the virtualization
>>> layer, or with EBS itself.  Just a shot in the dark though, other people
>>> would likely know much more than I.
>>>
>>>
>>>
>>> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie wrote:
>>>
 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli 
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be
 done when nodetool returns.

 =Rob



>>>
>>>
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> skype: rustyrazorblade
>>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
Another thing to keep in mind is that if you are hitting the issue I
described, waiting 60 seconds will not absolutely solve your problem, it
will only make it less likely to occur.  If a memtable has been partially
flushed at the 60 second mark you will end up with the same corrupt sstable.


On Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael
wrote:

> +1 for tablesnap
>
>
> On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad wrote:
>
>> I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
>> predictable.
>>
>> Additionally, from a practical standpoint, you may want to back up your
>> sstables somewhere.  If you use S3, it's easy to pull just the new tables
>> out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
>> incur the overhead of routinely backing up the entire dataset.  For a non
>> trivial database, this matters quite a bit.
>>
>>
>> On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> As I tried to say, EBS snapshots require much care or you get corruption
>>> such as you have encountered.
>>>
>>> Does Cassandra quiesce the file system after a snapshot using fsfreeze
>>> or xfs_freeze? Somehow I doubt it...
>>>
>>>
>>> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
>>>
 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie wrote:

> Robert,
>
> That is what I thought as well.  But apparently something is
> happening.  The only way I can get away with doing this is adding a sleep
> 60 right after the nodetool snapshot is executed.  I can reproduce this
> 100% of the time by not issuing a sleep after nodetool snapshot.
>
> This is the error.
>
> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290
> CassandraDaemon.java (line 191) Exception in thread
> Thread[SSTableBatchOpen:1,5,main]
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>  at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>  at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
> at
> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>  ... 11 more
>
>
>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
> wrote:
>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>
> Thank you for your quick response.
>
> Is there a way to tell when a snapshot is completely done?
>
>
> IIRC, the JMX call blocks until the snapshot completes. It should be
> done when nodetool returns.
>
> =Rob
>
>
>


 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade

>>>
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> skype: rustyrazorblade
>>
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello All,

We've a schema which can be modeled as (studentID, subjectID, marks) where
combination of studentID and subjectID is unique. Number of studentID can
go up to 100 million and for each studentID we can have up to  10k
subjectIDs.

We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We are
using a four node cluster, each having 24 cores and 32GB memory. I'm sure
that the machines are not underperformant as on same test bed we've
consistently received <5ms response times for ~1b documents when queried
via primary key.

I've tried three approaches, all of which result in significant
deterioration (>500 ms response time) in read query performance once number
of subjectIDs goes past ~100 for a studentID. Approaches are :-

1. model as (studentID int PRIMARY KEY, subjectID_marks_map map)
and query by subjectID

2. model as (studentID int, subjectID int, marks int, PRIMARY
KEY(studentID, subjectID) and query as select * from marks_table where
studentID = ?

3. model as (studentID int, subjectID int, marks int, PRIMARY
KEY(studentID, subjectID) and query as select * from marks_table where
studentID = ? and subjectID in (?, ?, ??)  number of subjectIDs in
query being ~1K.

What can be the bottlenecks. Is it better if we model as (studentID int,
subjct_marks_json text) and query by studentID.

-- 
Thanks & Regards,
Apoorva


Re: Read performance in map data type

2014-03-28 Thread Shrikar archak
Hi Apoorva,

I assume this is the table with studentId and subjectId  as primary keys
and not other like like marks in that.

create table marks_table(studentId int, subjectId int, marks int, PRIMARY
KEY(studentId,subjectId));

Also could you give the cfhistogram stats?

nodetool cfhistograms  marks_table;



Thanks,
Shrikar


On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav
wrote:

> Hello All,
>
> We've a schema which can be modeled as (studentID, subjectID, marks) where
> combination of studentID and subjectID is unique. Number of studentID can
> go up to 100 million and for each studentID we can have up to  10k
> subjectIDs.
>
> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We are
> using a four node cluster, each having 24 cores and 32GB memory. I'm sure
> that the machines are not underperformant as on same test bed we've
> consistently received <5ms response times for ~1b documents when queried
> via primary key.
>
> I've tried three approaches, all of which result in significant
> deterioration (>500 ms response time) in read query performance once number
> of subjectIDs goes past ~100 for a studentID. Approaches are :-
>
> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map)
> and query by subjectID
>
> 2. model as (studentID int, subjectID int, marks int, PRIMARY
> KEY(studentID, subjectID) and query as select * from marks_table where
> studentID = ?
>
> 3. model as (studentID int, subjectID int, marks int, PRIMARY
> KEY(studentID, subjectID) and query as select * from marks_table where
> studentID = ? and subjectID in (?, ?, ??)  number of subjectIDs in
> query being ~1K.
>
> What can be the bottlenecks. Is it better if we model as (studentID int,
> subjct_marks_json text) and query by studentID.
>
> --
> Thanks & Regards,
> Apoorva
>


Timeuuid inserted with now(), how to get the value back in Java client?

2014-03-28 Thread Andy Atj2
I'm writing a Java client to a Cassandra db.

One of the main primary keys is a timeuuid.

I plan to do INSERTs using now() and have Cassandra generate the value of
the timeuuid.

After the INSERT, I need the Cassandra-generated timeuuid value. Is there
an easy wsay to get it, without having to re-query for the record I just
inserted, hoping to get only one record back? Remember, I don't have the PK.

Eg, in every other db there's a way to get the generated PK back. In sql
it's @@identity, in oracle its...etc etc.

I know Cassandra is not an RDBMS. All I want is the value Cassandra just
generated.

Thanks,
Andy


Re: Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello Shrikar,

Yes primary key is (studentID, subjectID). I had dropped the test table,
recreating and populating it post which will share the cfhistogram. In such
case is there any practical limit on the rows I should fetch, for e.g.
should I do
   select * form marks_table where studentID = ? limit 500;
instead of doing
   select * form marks_table where studentID = ?;


On Sat, Mar 29, 2014 at 5:20 AM, Shrikar archak  wrote:

> Hi Apoorva,
>
> I assume this is the table with studentId and subjectId  as primary keys
> and not other like like marks in that.
>
> create table marks_table(studentId int, subjectId int, marks int, PRIMARY
> KEY(studentId,subjectId));
>
> Also could you give the cfhistogram stats?
>
> nodetool cfhistograms  marks_table;
>
>
>
> Thanks,
> Shrikar
>
>
> On Fri, Mar 28, 2014 at 3:53 PM, Apoorva Gaurav  > wrote:
>
>> Hello All,
>>
>> We've a schema which can be modeled as (studentID, subjectID, marks)
>> where combination of studentID and subjectID is unique. Number of studentID
>> can go up to 100 million and for each studentID we can have up to  10k
>> subjectIDs.
>>
>> We are using apahce cassandra 2.0.4 and datastax java driver 1.0.4. We
>> are using a four node cluster, each having 24 cores and 32GB memory. I'm
>> sure that the machines are not underperformant as on same test bed we've
>> consistently received <5ms response times for ~1b documents when queried
>> via primary key.
>>
>> I've tried three approaches, all of which result in significant
>> deterioration (>500 ms response time) in read query performance once number
>> of subjectIDs goes past ~100 for a studentID. Approaches are :-
>>
>> 1. model as (studentID int PRIMARY KEY, subjectID_marks_map map> int>) and query by subjectID
>>
>> 2. model as (studentID int, subjectID int, marks int, PRIMARY
>> KEY(studentID, subjectID) and query as select * from marks_table where
>> studentID = ?
>>
>> 3. model as (studentID int, subjectID int, marks int, PRIMARY
>> KEY(studentID, subjectID) and query as select * from marks_table where
>> studentID = ? and subjectID in (?, ?, ??)  number of subjectIDs in
>> query being ~1K.
>>
>> What can be the bottlenecks. Is it better if we model as (studentID int,
>> subjct_marks_json text) and query by studentID.
>>
>> --
>> Thanks & Regards,
>> Apoorva
>>
>
>


-- 
Thanks & Regards,
Apoorva