OrderPreservingPartitioner in 1.2

2013-08-23 Thread Takenori Sato
Hi,

I know it has been depreciated, but OrderPreservingPartitioner still works
with 1.2?

Just wanted to know how it works, but I got a couple of exceptions as below:

ERROR [GossipStage:2] 2013-08-23 07:03:57,171 CassandraDaemon.java (line
175) Exception in thread Thread[GossipStage:2,5,main]
java.lang.RuntimeException: The provided key was not UTF8 encoded.
at
org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:233)
at
org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:53)
at org.apache.cassandra.db.Table.apply(Table.java:379)
at org.apache.cassandra.db.Table.apply(Table.java:353)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:258)
at
org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:117)
at
org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:172)
at org.apache.cassandra.db.SystemTable.updatePeerInfo(SystemTable.java:258)
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1228)
at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:935)
at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:926)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:884)
at
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:57)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781)
at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
at
org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:229)
... 16 more

The key was 0ab68145 in HEX, that contains some control characters.

Another exception is this:

 INFO [main] 2013-08-23 07:04:27,659 StorageService.java (line 891)
JOINING: Starting to bootstrap...
DEBUG [main] 2013-08-23 07:04:27,659 BootStrapper.java (line 73) Beginning
bootstrap process
ERROR [main] 2013-08-23 07:04:27,666 CassandraDaemon.java (line 430)
Exception encountered during startup
java.lang.IllegalStateException: No sources found for (H,H]
at
org.apache.cassandra.dht.RangeStreamer.getAllRangesWithSourcesFor(RangeStreamer.java:163)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:121)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:81)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:924)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:693)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:548)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:445)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:325)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:413)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:456)
ERROR [StorageServiceShutdownHook] 2013-08-23 07:04:27,672
CassandraDaemon.java (line 175) Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException
at
org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
at
org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362)
at
org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:662)

I tried to setup 3 nodes cluster with tokens, A, H, P for each. This error
was raised by the second node with the token, H.

Thanks,
Takenori


Re: Random Distribution, yet Order Preserving Partitioner

2013-08-23 Thread Nikolay Mihaylov
It can handle some millions of columns, but not more like 10M. I mean, a
request for such a row concentrates on a particular node, so the
performance degrades.

 I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.

works for us with wide row with about 40-50 M, but with lots of problems.

my research with get_count() shows first minor problems at 14-15K columns
in a row and then it just get worse.




On Fri, Aug 23, 2013 at 2:47 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi Nick,

  token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 True. That reminds me of making a test with the latest 1.2 instead of our
 current 1.0!

  if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.

 Yeah, we have done for a long time. That's called a wide row, right? Or a
 compound primary key.

 It can handle some millions of columns, but not more like 10M. I mean, a
 request for such a row concentrates on a particular node, so the
 performance degrades.

  I also had idea for semi-ordered partitioner - instead of single MD5,
 to have two MD5's.

 Sounds interesting. But, we need a fully ordered result.

 Anyway, I will try with the latest version.

 Thanks,
 Takenori


 On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov n...@nmmm.nu wrote:

 my five cents -
 token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.
 for example long ago, i had single column family with single key and
 about 2-3 M columns - I do not suggest you to do it this way, because is
 wrong way, but it is easy to understand the idea.

 I also had idea for semi-ordered partitioner - instead of single MD5, to
 have two MD5's.
 then you can get semi-ordered ranges, e.g. you get ordered all cities in
 Canada, all cities in US and so on.
 however in this way things may get pretty non-ballanced

 Nick





 On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato ts...@cloudian.comwrote:

 Hi,

 I am trying to implement a custom partitioner that evenly distributes,
 yet preserves order.

 The partitioner returns a token by BigInteger as RandomPartitioner does,
 while does a decorated key by string as OrderPreservingPartitioner does.
 * for now, since IPartitionerT does not support different types for
 token and key, BigInteger is simply converted to string

 Then, I played around with cassandra-cli. As expected, in my 3 nodes
 test cluster, get/set worked, but list(get_range_slices) didn't.

 This came from a challenge to overcome a wide row scalability. So, I
 want to make it work!

 I am aware that some efforts are required to make get_range_slices work.
 But are there any other critical problems? For example, it seems there is
 an assumption that token and key are the same. If this is throughout the
 whole C* code, this partitioner is not practical.

 Or have your tried something similar?

 I would appreciate your feedback!

 Thanks,
 Takenori






Re: OrderPreservingPartitioner in 1.2

2013-08-23 Thread Vara Kumar
For the first exception: OPP was not working in 1.2. It has been fixed but
not yet there in latest 1.2.8 version.

Jira issue about it: https://issues.apache.org/jira/browse/CASSANDRA-5793


On Fri, Aug 23, 2013 at 12:51 PM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I know it has been depreciated, but OrderPreservingPartitioner still works
 with 1.2?

 Just wanted to know how it works, but I got a couple of exceptions as
 below:

 ERROR [GossipStage:2] 2013-08-23 07:03:57,171 CassandraDaemon.java (line
 175) Exception in thread Thread[GossipStage:2,5,main]
 java.lang.RuntimeException: The provided key was not UTF8 encoded.
  at
 org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:233)
 at
 org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:53)
  at org.apache.cassandra.db.Table.apply(Table.java:379)
 at org.apache.cassandra.db.Table.apply(Table.java:353)
  at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:258)
 at
 org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:117)
  at
 org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:172)
 at org.apache.cassandra.db.SystemTable.updatePeerInfo(SystemTable.java:258)
  at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1228)
 at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:935)
  at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:926)
 at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:884)
  at
 org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:57)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 Caused by: java.nio.charset.MalformedInputException: Input length = 1
 at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
  at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781)
 at
 org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
  at
 org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
 at
 org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:229)
  ... 16 more

 The key was 0ab68145 in HEX, that contains some control characters.

 Another exception is this:

  INFO [main] 2013-08-23 07:04:27,659 StorageService.java (line 891)
 JOINING: Starting to bootstrap...
 DEBUG [main] 2013-08-23 07:04:27,659 BootStrapper.java (line 73) Beginning
 bootstrap process
 ERROR [main] 2013-08-23 07:04:27,666 CassandraDaemon.java (line 430)
 Exception encountered during startup
 java.lang.IllegalStateException: No sources found for (H,H]
 at
 org.apache.cassandra.dht.RangeStreamer.getAllRangesWithSourcesFor(RangeStreamer.java:163)
  at
 org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:121)
 at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:81)
  at
 org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:924)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:693)
  at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:548)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:445)
  at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:325)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:413)
  at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:456)
 ERROR [StorageServiceShutdownHook] 2013-08-23 07:04:27,672
 CassandraDaemon.java (line 175) Exception in thread
 Thread[StorageServiceShutdownHook,5,main]
 java.lang.NullPointerException
 at
 org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
 at
 org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362)
  at
 org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)
 at
 org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:662)

 I tried to setup 3 nodes cluster with tokens, A, H, P for each. This error
 was raised by the second node with the token, H.

 Thanks,
 Takenori



How to perform range queries efficiently?

2013-08-23 Thread Sávio Teles
I need to perform range query efficiently. I have the table like:

users
---
user_id | age | gender | salary | ...

The attr user_id is the PRIMARY KEY.

Example of querying:

select * from users where user_id = '*x*' and age  *y *and age  *z* and
salary  *a* and salary  *b  *and age='M';

This query takes a long time to run. Any ideas to perform it efficiently?

Tks in advance.


-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


CqlStorage creates wrong schema for Pig

2013-08-23 Thread Chad Johnston
(I'm using Cassandra 1.2.8 and Pig 0.11.1)

I'm loading some simple data from Cassandra into Pig using CqlStorage. The
CqlStorage loader defines a Pig schema based on the Cassandra schema, but
it seems to be wrong.

If I do:

data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;

I get this:

data: {isbn: chararray,bookauthor: chararray,booktitle:
chararray,publisher: chararray,yearofpublication: int}

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

Clearly the results from Cassandra are key/value pairs, as would be
expected. I don't know why the schema generated by CqlStorage() would be so
different.

This is really causing me problems trying to access the column values. I
tried a naive approach of FLATTENing each tuple, then trying to access the
values that way:

flattened = FOREACH data GENERATE
  FLATTEN(isbn),
  FLATTEN(booktitle),
  ...
values = FOREACH flattened GENERATE
  $1 AS ISBN,
  $3 AS BookTitle,
  ...

As soon as I try to access field $5, Pig complains about the index being
out of bounds.

Is there a way to solve the schema/reality mismatch? Am I doing something
wrong, or have I stumbled across a defect?

Thanks,
Chad


Re: Continue running major compaction after switching to LeveledCompactionStrategy

2013-08-23 Thread Nate McCall
Take a look at the following article:
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

You'll want to monitor your IOPS for a while to make sure you can spare the
overhead before you try it. Certainly one at a time on column families and
only where the use case makes sense given the above.


On Thu, Aug 22, 2013 at 9:58 PM, Lucas Fernandes Brunialti 
lbrunia...@igcorp.com.br wrote:

 Hello,

 I also have some doubts about changing to leveled compaction:

 1) Is this change computationally expensive? My sstables have around 7gb
 of data, I'm afraid the nodes won't handle the pressure of compactions,
 maybe dying by OOM or getting an extremely high latency during the
 compactions...

 2) How long does this transition takes? I mean, to finish the splitting of
 these sstables and all the compactions needed... I wanted to know this to
 make a fair comparison of which compaction algorithm is better for my data.

 3) And finally, which would be an optimal size for the sstables, that LCS
 parameter?

 I'm running a 8 node cluster on aws (ec2 m1.xlarge), using ephemeral
 drives and cassandra version 1.2.3.

 I will really appreciate the help! :)

 Lucas Brunialti.
 Thanks much Rob!

 Brian



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Continue-running-major-compaction-after-switching-to-LeveledCompactionStrategy-tp7589839p7589846.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: row cache

2013-08-23 Thread Robert Coli
On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala 
fsareshw...@quantcast.com wrote:

 According to the datastax documentation [1], there are two types of row
 cache providers:

...

 The off-heap row cache provider does indeed invalidate rows. We're going
 to look into using the ConcurrentLinkedHashCacheProvider. Time to read
 some source code! :)


Thanks for the follow up... I'm used to thinking of the
ConcurrentLinkedHashCacheProvider as the row cache and forgot that
SerializingCacheProvider
might have different invalidation behavior. Invalidating the whole row on
write seems highly likely to reduce the overall performance of such a row
cache. :)

The criteria for use of row cache mentioned up-thread remain relevant. In
most cases, you probably don't actually want to use the row cache.
Especially if you're using ConcurrentLinkedHashCacheProvider and creating
long lived, on heap objects.

=Rob


Re: Moving a cluster between networks.

2013-08-23 Thread Tim Wintle
On Wed, 2013-08-21 at 10:42 -0700, Robert Coli wrote:
 On Wed, Aug 21, 2013 at 3:58 AM, Tim Wintle timwin...@gmail.com wrote:
 
  What would the best way to achieve this? (We can tolerate a fairly short
  period of downtime).
 
 
 I think this would work, but may require a full cluster shutdown.
 
 1) stop nodes on old network
 2) set auto_bootstrap to false in the conf file (it's not in there, you
 will have to add it to set it to false)
 3) change the listen_address/seed lists/etc. in cassandra.yaml to be the
 new ips
 4) start nodes, seed nodes first

Thank you,

I tried a quick test on local VMs before an it appeared to work, but I'm
still a little worried if the old ip addresses would appear through some
process that only kicks in in a realistic use.

I'll try to set up a more realistic simulation to test before going
ahead.

Tim

 
 Basically I think the nodes will join, announce their new ip, not
 bootstrap, and eventually the entire cluster will coalesce on new ips.
 
 If I were you, I would probably try to set up a QA or test cluster with
 similar setup.
 
 =Rob




Memtable flush blocking writes

2013-08-23 Thread Ken Hancock
I appear to have a problem illustrated by
https://issues.apache.org/jira/browse/CASSANDRA-1955. At low data
rates, I'm seeing mutation messages dropped because writers are
blocked as I get a storm of memtables being flushed. OpsCenter
memtables seem to also contribute to this:

INFO [OptionalTasks:1] 2013-08-23 01:53:58,522 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforiczone@1281182121(14976/120803 serialized/live
bytes, 360 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,523 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforchannel@705923070(278200/1048576
serialized/live bytes, 6832 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-solr_resources@1615459594(66362/66362 serialized/live bytes,
4 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-scheduleddaychannelie@393647337(33203968/36700160
serialized/live bytes, 865620 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountfornetwork@1781160199(8680/124903
serialized/live bytes, 273 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups7200@37425413(6504/23 serialized/live bytes, 271
ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups60@1943691367(638176/1048576 serialized/live bytes,
39894 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-events@99567005(1133/1133
serialized/live bytes, 39 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups300@532892022(184296/1048576 serialized/live bytes,
7679 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-ie@1309405764(457390051/152043520 serialized/live bytes,
16956160 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,823 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-videoexpectedformat@1530999508(684/24557 serialized/live
bytes, 12453 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,929 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountforzone@411870848(9200/95294 serialized/live
bytes, 284 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,012 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-rollups86400@744253892(456/456
serialized/live bytes, 19 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,364 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-peers@2024878954(2006/40629
serialized/live bytes, 452 ops)

I had a tpstats running across all the nodes in my cluster every 5
seconds or so and observe the following:

2013-08-23T01:53:47 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:53:55 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:54:00 192.168.131.227 FlushWriter 2 10 37 1 5
2013-08-23T01:54:07 192.168.131.227 FlushWriter 1 1 53 0 11
2013-08-23T01:54:12 192.168.131.227 FlushWriter 1 1 53 0 11

Now I can increase memtable_flush_queue_size, but it seems based on
the above that in order to solve the problem, I need to set this to
count(CF). What's the downside of this approach? It seems a backwards
solution to the real problem...


Re: row cache

2013-08-23 Thread Bill de hÓra
I can't emphasise enough testing row caching against your workload for 
sustained periods and comparing results to just leveraging the 
filesystem cache and/or ssds. That said. The default off-heap cache can 
work for structures that don't mutate frequently, and whose rows are not 
very wide such that the in-and-out-of heap serialization overhead is 
minimised (I've seen the off-heap cache slow a system down because of 
serialization costs). The on-heap can do update in place, which is nice 
for more frequently changing structures, and for larger structures 
because it dodges the off-heap's serialization overhead. One problem 
I've experienced with the on-heap cache is the cache working set 
exceeding allocated space, resulting in GC pressure from sustained 
thrash/evictions.


Neither cache seems suitable for wide row + slicing usecases, eg time 
series data or CQL tables whose compound keys create wide rows under the 
hood.


Bill


On 2013/08/23 17:30, Robert Coli wrote:

On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
fsareshw...@quantcast.com mailto:fsareshw...@quantcast.com wrote:

According to the datastax documentation [1], there are two types of
row cache providers:

...

The off-heap row cache provider does indeed invalidate rows. We're
going to look into using the ConcurrentLinkedHashCacheProvider. Time
to read some source code! :)


Thanks for the follow up... I'm used to thinking of the
ConcurrentLinkedHashCacheProvider as the row cache and forgot that
SerializingCacheProvider might have different invalidation behavior.
Invalidating the whole row on write seems highly likely to reduce the
overall performance of such a row cache. :)

The criteria for use of row cache mentioned up-thread remain relevant.
In most cases, you probably don't actually want to use the row cache.
Especially if you're using ConcurrentLinkedHashCacheProvider and
creating long lived, on heap objects.

=Rob




Cassandra JVM heap sizes on EC2

2013-08-23 Thread David Laube
Hi All,

We are evaluating our JVM heap size configuration on Cassandra 1.2.8 and would 
like to get some feedback from the community as to what the proper JVM heap 
size should be for cassandra nodes deployed on to Amazon EC2. We are running 
m2.4xlarge EC2 instances (64GB RAM, 8 core, 2 x 840GB disks) --so we will have 
plenty of RAM. I've already consulted the docs at 
http://www.datastax.com/documentation/cassandra/1.2/mobile/cassandra/operations/ops_tune_jvm_c.html
 but would love to hear what is working or not working for you in the wild. 
Since Datastax cautions against using more than 8GB, I'm wondering if it is 
even advantageous to use even slightly more.

Thanks,
-David Laube



Re: Commitlog files not getting deleted

2013-08-23 Thread Robert Coli
On Thu, Aug 22, 2013 at 10:40 AM, Jay Svc jaytechg...@gmail.com wrote:

 its DSE 3.1 Cassandra 2.1


Not 2.1... 1.2.1? Web search is sorta inconclusive on this topic, you'd
think it'd be more easily referenced?

=Rob


Re: Cassandra JVM heap sizes on EC2

2013-08-23 Thread Brian Tarbox
The advice I heard at the New York C* conference...which we follow is to
use the m2.2xlarge and give it about 8 GB.  The m2.4xlarge seems overkill
(or at least over price).

Brian


On Fri, Aug 23, 2013 at 6:12 PM, David Laube d...@stormpath.com wrote:

 Hi All,

 We are evaluating our JVM heap size configuration on Cassandra 1.2.8 and
 would like to get some feedback from the community as to what the proper
 JVM heap size should be for cassandra nodes deployed on to Amazon EC2. We
 are running m2.4xlarge EC2 instances (64GB RAM, 8 core, 2 x 840GB disks)
 --so we will have plenty of RAM. I've already consulted the docs at
 http://www.datastax.com/documentation/cassandra/1.2/mobile/cassandra/operations/ops_tune_jvm_c.html
  but
 would love to hear what is working or not working for you in the wild.
 Since Datastax cautions against using more than 8GB, I'm wondering if it is
 even advantageous to use even slightly more.

 Thanks,
 -David Laube