Re: Invalid argument

2012-11-20 Thread Alain RODRIGUEZ
Hi Aaron.

Here is my java -version

java version 1.6.0_35
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

Thanks for the work around, setting disk_access_mode: standard worked.

Alain


2012/11/19 aaron morton aa...@thelastpickle.com

 Are you running a 32 bit JVM ? What is the full JVM version ?

 As a work around you can try disabling memory mapped access set
 disk_access_mode to standard.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/11/2012, at 6:27 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 I have backed up production sstables from one of my 3 production nodes
 (RF=3) and I want to use them on my dev environment.(C* 1.1.6 on both
 environments)

 My dev server is a 4 core, 4 GB RAM hardware runing on ubuntu.

 I have applied the production schema in my dev node and copied all sstable
 in the appropriated folder and restart my node like I always do.

 But this time have had the following error (many times and only for ) :

  INFO [SSTableBatchOpen:4] 2012-11-19 17:52:52,980 SSTableReader.java
 (line 169) Opening
 /var/lib/cassandra/data/cassa_teads/data_action/cassa_teads-data_action-hf-660
 (7015417424 bytes)
 ERROR [SSTableBatchOpen:3] 2012-11-19 17:53:17,259
 AbstractCassandraDaemon.java (line 135) Exception in thread
 Thread[SSTableBatchOpen:3,5,main]
 java.io.IOError: java.io.IOException: Invalid argument
  at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:242)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.io.IOException: Invalid argument
 at sun.nio.ch.FileChannelImpl.truncate0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(Unknown Source)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:194)
 ... 11 more

 If I try with nodetool refresh I have the following error :

 Exception in thread main java.io.IOError: java.io.IOException: Invalid
 argument
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
 at
 org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:510)
 at
 org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:468)
 at
 org.apache.cassandra.service.StorageService.loadNewSSTables(StorageService.java:3089)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
 at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown
 Source)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
 at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source)
 at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
 Source)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
 Source)
 at 

Re: Looking for a good Ruby client

2012-11-20 Thread Mat Brown
As the author of Cequel, I can assure you it is excellent ; )

We use it in production at Brewster and it is quite stable. If you try
it out and find any bugs, we'll fix 'em  quickly.

I'm planning a big overhaul of the model layer over the holidays to
expose all the
new data modeling goodness in CQL3 (while still retaining
compatibility with CQL2 structures).

On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com wrote:
 Update on this: someone just pointed me towards the Cequel gem:
 https://github.com/brewster/cequel

 The way it's described in the readme it looks like exactly what I was
 looking for - a modern, CQL-based gem that is in active development and also
 follows the ActiveModel pattern.  I'd be very interested to hear if anybody
 has used this, whether it's stable/reliable, etc.

 Thanks.

 Harry

 On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote:

 Harry, we're in a similar situation and are starting to work out our own
 ruby client. The biggest issue is that it doesn't make much sense to build a
 higher level abstraction on anything other than CQL3, given where things are
 headed. At least this is our opinion.
 At the same time, CQL3 is just barely becoming usable and still seems
 rather deficient in wide-row usage. The tricky part is that with the current
 CQL3 you have to construct quite complex iterators to retrieve a large
 result set. Which means that you end up having to either parse CQL3 coming
 in to insert the iteration stuff, or you have to pass CQL3 fragments in and
 compose them together with iterator clauses. Not fun stuff either way.
 The only good solution I see is to switch to a streaming protocol (or
 build some form of continue on top of thrift) such that the client can ask
 for a huge result set and the cassandra coordinator can break it into
 sub-queries as it sees fit and return results chunk-by-chunk. If this is
 really the path forward then all abstractions built above CQL3 before that
 will either have a good piece of complex code that can be deleted or worse,
 will have an interface that is no longer best practice.
 Good luck!
 Thorsten



 On 8/1/2012 1:47 PM, Harry Wilkinson wrote:

 Hi,

 I'm looking for a Ruby client for Cassandra that is pretty high-level.  I
 am really hoping to find a Ruby gem of high quality that allows a developer
 to create models like you would with ActiveModel.

 So far I have figured out that the canonical Ruby client for Cassandra is
 Twitter's Cassandra gem of the same name.  It looks great - mature, still in
 active development, etc.  No stated support for Ruby 1.9.3 that I can see,
 but I can probably live with that for now.

 What I'm looking for is a higher-level gem built on that gem that works
 like ActiveModel in that you just include a module in your model class and
 that gives you methods to declare your model's serialized attributes and
 also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc.

 I've been trying out some different NoSQL databases recently, and for
 example there is an official Ruby client for Riak with a domain model that
 is close to Riak's, but then there's also a gem called 'Ripple' that uses a
 domain model that is closer to what most Ruby developers are used to.  So it
 looks like Twitter's Cassandra gem is the one that stays close to the domain
 model of Cassandra, and what I'm looking for is a gem that's a Cassandra
 equivalent of RIpple.

 From some searching I found cassandra_object, which has been inactive for
 a couple of years, but there's a fork that looks like it's being maintained,
 but I have not found any kind of information to suggest the maintained fork
 is in general use yet.  I have found quite a lot of gems of a similar style
 that people have started and then not really got very far with.

 So, does anybody know of a suitable gem?  Would you recommend it?  Or
 perhaps you would recommend not using such a gem and sticking with the
 lower-level client gem?

 Thanks in advance for your advice.

 Harry





Re: Looking for a good Ruby client

2012-11-20 Thread Alain RODRIGUEZ
@Mat

Well I guess you could add your Ruby client to this list since there is not
a lot of them yet.

http://wiki.apache.org/cassandra/ClientOptions

Alain


2012/11/20 Mat Brown m...@brewster.com

 As the author of Cequel, I can assure you it is excellent ; )

 We use it in production at Brewster and it is quite stable. If you try
 it out and find any bugs, we'll fix 'em  quickly.

 I'm planning a big overhaul of the model layer over the holidays to
 expose all the
 new data modeling goodness in CQL3 (while still retaining
 compatibility with CQL2 structures).

 On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com
 wrote:
  Update on this: someone just pointed me towards the Cequel gem:
  https://github.com/brewster/cequel
 
  The way it's described in the readme it looks like exactly what I was
  looking for - a modern, CQL-based gem that is in active development and
 also
  follows the ActiveModel pattern.  I'd be very interested to hear if
 anybody
  has used this, whether it's stable/reliable, etc.
 
  Thanks.
 
  Harry
 
  On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote:
 
  Harry, we're in a similar situation and are starting to work out our own
  ruby client. The biggest issue is that it doesn't make much sense to
 build a
  higher level abstraction on anything other than CQL3, given where
 things are
  headed. At least this is our opinion.
  At the same time, CQL3 is just barely becoming usable and still seems
  rather deficient in wide-row usage. The tricky part is that with the
 current
  CQL3 you have to construct quite complex iterators to retrieve a large
  result set. Which means that you end up having to either parse CQL3
 coming
  in to insert the iteration stuff, or you have to pass CQL3 fragments in
 and
  compose them together with iterator clauses. Not fun stuff either way.
  The only good solution I see is to switch to a streaming protocol (or
  build some form of continue on top of thrift) such that the client
 can ask
  for a huge result set and the cassandra coordinator can break it into
  sub-queries as it sees fit and return results chunk-by-chunk. If this is
  really the path forward then all abstractions built above CQL3 before
 that
  will either have a good piece of complex code that can be deleted or
 worse,
  will have an interface that is no longer best practice.
  Good luck!
  Thorsten
 
 
 
  On 8/1/2012 1:47 PM, Harry Wilkinson wrote:
 
  Hi,
 
  I'm looking for a Ruby client for Cassandra that is pretty high-level.
  I
  am really hoping to find a Ruby gem of high quality that allows a
 developer
  to create models like you would with ActiveModel.
 
  So far I have figured out that the canonical Ruby client for Cassandra
 is
  Twitter's Cassandra gem of the same name.  It looks great - mature,
 still in
  active development, etc.  No stated support for Ruby 1.9.3 that I can
 see,
  but I can probably live with that for now.
 
  What I'm looking for is a higher-level gem built on that gem that works
  like ActiveModel in that you just include a module in your model class
 and
  that gives you methods to declare your model's serialized attributes and
  also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc.
 
  I've been trying out some different NoSQL databases recently, and for
  example there is an official Ruby client for Riak with a domain model
 that
  is close to Riak's, but then there's also a gem called 'Ripple' that
 uses a
  domain model that is closer to what most Ruby developers are used to.
  So it
  looks like Twitter's Cassandra gem is the one that stays close to the
 domain
  model of Cassandra, and what I'm looking for is a gem that's a Cassandra
  equivalent of RIpple.
 
  From some searching I found cassandra_object, which has been inactive
 for
  a couple of years, but there's a fork that looks like it's being
 maintained,
  but I have not found any kind of information to suggest the maintained
 fork
  is in general use yet.  I have found quite a lot of gems of a similar
 style
  that people have started and then not really got very far with.
 
  So, does anybody know of a suitable gem?  Would you recommend it?  Or
  perhaps you would recommend not using such a gem and sticking with the
  lower-level client gem?
 
  Thanks in advance for your advice.
 
  Harry
 
 
 



Re: Looking for a good Ruby client

2012-11-20 Thread Timmy Turner
@Mat Brown:

 (while still retaining compatibility with CQL2 structures).

Do you mean by exceeding what Cassandra itself provides in terms of CQL2/3
interoperability?

I'm looking into something similar currently (however in Java not in Ruby)
and would be interested in your experiences, if you follow through with the
plan. Do you have a blog?


Thanks!


2012/11/20 Alain RODRIGUEZ arodr...@gmail.com

 @Mat

 Well I guess you could add your Ruby client to this list since there is
 not a lot of them yet.

 http://wiki.apache.org/cassandra/ClientOptions

 Alain


 2012/11/20 Mat Brown m...@brewster.com

 As the author of Cequel, I can assure you it is excellent ; )

 We use it in production at Brewster and it is quite stable. If you try
 it out and find any bugs, we'll fix 'em  quickly.

 I'm planning a big overhaul of the model layer over the holidays to
 expose all the
 new data modeling goodness in CQL3 (while still retaining
 compatibility with CQL2 structures).

 On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com
 wrote:
  Update on this: someone just pointed me towards the Cequel gem:
  https://github.com/brewster/cequel
 
  The way it's described in the readme it looks like exactly what I was
  looking for - a modern, CQL-based gem that is in active development and
 also
  follows the ActiveModel pattern.  I'd be very interested to hear if
 anybody
  has used this, whether it's stable/reliable, etc.
 
  Thanks.
 
  Harry
 
  On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote:
 
  Harry, we're in a similar situation and are starting to work out our
 own
  ruby client. The biggest issue is that it doesn't make much sense to
 build a
  higher level abstraction on anything other than CQL3, given where
 things are
  headed. At least this is our opinion.
  At the same time, CQL3 is just barely becoming usable and still seems
  rather deficient in wide-row usage. The tricky part is that with the
 current
  CQL3 you have to construct quite complex iterators to retrieve a large
  result set. Which means that you end up having to either parse CQL3
 coming
  in to insert the iteration stuff, or you have to pass CQL3 fragments
 in and
  compose them together with iterator clauses. Not fun stuff either way.
  The only good solution I see is to switch to a streaming protocol (or
  build some form of continue on top of thrift) such that the client
 can ask
  for a huge result set and the cassandra coordinator can break it into
  sub-queries as it sees fit and return results chunk-by-chunk. If this
 is
  really the path forward then all abstractions built above CQL3 before
 that
  will either have a good piece of complex code that can be deleted or
 worse,
  will have an interface that is no longer best practice.
  Good luck!
  Thorsten
 
 
 
  On 8/1/2012 1:47 PM, Harry Wilkinson wrote:
 
  Hi,
 
  I'm looking for a Ruby client for Cassandra that is pretty high-level.
  I
  am really hoping to find a Ruby gem of high quality that allows a
 developer
  to create models like you would with ActiveModel.
 
  So far I have figured out that the canonical Ruby client for Cassandra
 is
  Twitter's Cassandra gem of the same name.  It looks great - mature,
 still in
  active development, etc.  No stated support for Ruby 1.9.3 that I can
 see,
  but I can probably live with that for now.
 
  What I'm looking for is a higher-level gem built on that gem that works
  like ActiveModel in that you just include a module in your model class
 and
  that gives you methods to declare your model's serialized attributes
 and
  also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc.
 
  I've been trying out some different NoSQL databases recently, and for
  example there is an official Ruby client for Riak with a domain model
 that
  is close to Riak's, but then there's also a gem called 'Ripple' that
 uses a
  domain model that is closer to what most Ruby developers are used to.
  So it
  looks like Twitter's Cassandra gem is the one that stays close to the
 domain
  model of Cassandra, and what I'm looking for is a gem that's a
 Cassandra
  equivalent of RIpple.
 
  From some searching I found cassandra_object, which has been inactive
 for
  a couple of years, but there's a fork that looks like it's being
 maintained,
  but I have not found any kind of information to suggest the maintained
 fork
  is in general use yet.  I have found quite a lot of gems of a similar
 style
  that people have started and then not really got very far with.
 
  So, does anybody know of a suitable gem?  Would you recommend it?  Or
  perhaps you would recommend not using such a gem and sticking with the
  lower-level client gem?
 
  Thanks in advance for your advice.
 
  Harry
 
 
 





Re: Looking for a good Ruby client

2012-11-20 Thread Mat Brown
Hi Timmy,

I haven't done a lot of playing with CQL3 yet, mostly just reading the
blog posts, so the following is subject to change : )

Right now, the Cequel model layer has a skinny row model (which is
designed to follow common patterns of Ruby ORMs) and a wide row model
(which is designed to behave more or less like a Hash, the equivalent
of Java's HashMap). The two don't integrate with each other in any
meaningful way, but as far as I understand it, they do pretty much
cover the data modeling possibilities in CQL2.

The big idea I've got for the overhaul of Cequel for CQL3 is to allow
building a rich, nested data model by integrating different flavors of
CQL3 table, most notably multi-column primary keys, as well as
collections. The core data types I have in mind are:

1) Skinny row with simple primary key (e.g. blogs, with blog_id key)
2) Skinny row with complex primary key (e.g. blog_posts, with
(blog_id, post_id) key)
3) Wide row with simple primary key (e.g. blog_languages -- kind of a
weak example but i can't think of anything better for a blog : )
4) Wide row with complex primary key (e.g. blog_post_tags)

My goal is to make it easy to model one-one relationships via a shared
primary key, and one-many via a shared prefix of the primary key. So,
for instance, blogs and blog_languages rows would be one-one (both
with a blog_id primary key) and blogs and blog_posts would be one-many
(sharing the blog_id prefix in the primary key).

From what I've read, it seems fairly clear that the actual CQL used to
interact with #1 will be the same for CQL2 column families and CQL3
tables, so no explicit backward compatibility would be needed. #2 and
#4 are, of course, CQL3-only, so backward compatibility isn't an issue
there either. What I'm not entirely clear on is #3 -- this is
straightforward in CQL2, and presumably a CQL3 table with compact
storage would behave in the same way. However, my understanding so far
is that a non-compact CQL3 table would treat this structure
differently, in that both the key and value of the map would
correspond to columns in a CQL3 table. It may make more sense to just
target compact storage tables with this data structure, but I'm going
to need to play around with it more to figure that out. Otherwise,
Cequel will need to provide two flavors of that structure.

There's also some tension between CQL3 collections and just using
traditional wide-row structures to achieve the same thing. For
instance, blog_tags could also just be a tags collection in the blogs
table. My plan at this point is to offer both options, since each has
its advantages (collections don't require the creation of a separate
table; but a separate table gives you access to slices of the
collection).

Anyway, that's probably a lot more of an answer than you needed, but
hopefully the context helps. Definitely interested to hear about the
direction you take your client in as well.

Finally, regarding a blog, we've got one set up, but it's not live
yet. I'll ping you with a link when it is; I'll certainly be posting
on the development of the next Cequel release.

Cheers,
Mat

On Tue, Nov 20, 2012 at 9:23 AM, Timmy Turner timm.t...@gmail.com wrote:
 @Mat Brown:

 (while still retaining compatibility with CQL2 structures).

 Do you mean by exceeding what Cassandra itself provides in terms of CQL2/3
 interoperability?

 I'm looking into something similar currently (however in Java not in Ruby)
 and would be interested in your experiences, if you follow through with the
 plan. Do you have a blog?


 Thanks!


 2012/11/20 Alain RODRIGUEZ arodr...@gmail.com

 @Mat

 Well I guess you could add your Ruby client to this list since there is
 not a lot of them yet.

 http://wiki.apache.org/cassandra/ClientOptions

 Alain


 2012/11/20 Mat Brown m...@brewster.com

 As the author of Cequel, I can assure you it is excellent ; )

 We use it in production at Brewster and it is quite stable. If you try
 it out and find any bugs, we'll fix 'em  quickly.

 I'm planning a big overhaul of the model layer over the holidays to
 expose all the
 new data modeling goodness in CQL3 (while still retaining
 compatibility with CQL2 structures).

 On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com
 wrote:
  Update on this: someone just pointed me towards the Cequel gem:
  https://github.com/brewster/cequel
 
  The way it's described in the readme it looks like exactly what I was
  looking for - a modern, CQL-based gem that is in active development and
  also
  follows the ActiveModel pattern.  I'd be very interested to hear if
  anybody
  has used this, whether it's stable/reliable, etc.
 
  Thanks.
 
  Harry
 
  On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote:
 
  Harry, we're in a similar situation and are starting to work out our
  own
  ruby client. The biggest issue is that it doesn't make much sense to
  build a
  higher level abstraction on anything other than CQL3, given where
  things are
  

Re: Upgrade 1.1.2 - 1.1.6

2012-11-20 Thread Mike Heffner
Alain,

My understanding is that drain ensures that all memtables are flushed, so
that there is no data in the commitlog that is isn't in an sstable. A
marker is saved that indicates the commit logs should not be replayed.
Commitlogs are only removed from disk periodically
(after commitlog_total_space_in_mb is exceeded?).

With 1.1.5/6, all nanotime commitlogs are replayed on startup regardless of
whether they've been flushed. So in our case manually removing all the
commitlogs after a drain was the only way to prevent their replay.

Mike




On Tue, Nov 20, 2012 at 5:19 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 @Mike

 I am glad to see I am not the only one with this issue (even if I am sorry
 it happened to you of course.).

 Isn't drain supposed to clear the commit logs ? Did removing them worked
 properly ?

 I his warning to C* users, Jonathan Ellis told that a drain would avoid
 this issue, It seems like it doesn't.

 @Rob

 You understood precisely the 2 issues I met during the upgrade. I am sad
 to see none of them is yet resolved and probably wont.


 2012/11/20 Mike Heffner m...@librato.com

 Alain,

 We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs
 replayed regardless of the drain. After noticing this on the first node, we
 did the following:

 * nodetool flush
 * nodetool drain
 * service cassandra stop
 * mv /path/to/logs/*.log /backup/
 * apt-get install cassandra
 restarts automatically

 I also agree that starting C* after an upgrade/install seems quite broken
 if it was already stopped before the install. However annoying, I have
 found this to be the default for most Ubuntu daemon packages.

 Mike


 On Thu, Nov 15, 2012 at 9:21 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 We had an issue with counters over-counting even using the nodetool
 drain command before upgrading...

 Here is my bash history

69  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
70  cp /etc/cassandra/cassandra-env.sh
 /etc/cassandra/cassandra-env.sh.bak
71  sudo apt-get install cassandra
72  nodetool disablethrift
73  nodetool drain
74  service cassandra stop
75  cat /etc/cassandra/cassandra-env.sh
 /etc/cassandra/cassandra-env.sh.bak
76  vim /etc/cassandra/cassandra-env.sh
77  cat /etc/cassandra/cassandra.yaml
 /etc/cassandra/cassandra.yaml.bak
78  vim /etc/cassandra/cassandra.yaml
79  service cassandra start

 So I think I followed these steps
 http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps

 I merged my conf files with an external tool so consider I merged my
 conf files on steps 76 and 78.

 I saw that the sudo apt-get install cassandra stop the server and
 restart it automatically. So it updated without draining and restart before
 I had the time to reconfigure the conf files. Is this normal ? Is there a
 way to avoid it ?

 So for the second node I decided to try to stop C*before the upgrade.

   125  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
   126  cp /etc/cassandra/cassandra-env.sh
 /etc/cassandra/cassandra-env.sh.bak
   127  nodetool disablegossip
   128  nodetool disablethrift
   129  nodetool drain
   130  service cassandra stop
   131  sudo apt-get install cassandra

 //131 : This restarted cassandra

   132  nodetool disablethrift
   133  nodetool disablegossip
   134  nodetool drain
   135  service cassandra stop
   136  cat /etc/cassandra/cassandra-env.sh
 /etc/cassandra/cassandra-env.sh.bak
   137  cim /etc/cassandra/cassandra-env.sh
   138  vim /etc/cassandra/cassandra-env.sh
   139  cat /etc/cassandra/cassandra.yaml
 /etc/cassandra/cassandra.yaml.bak
   140  vim /etc/cassandra/cassandra.yaml
   141  service cassandra start

 After both of these updates I saw my current counters increase without
 any reason.

 Did I do anything wrong ?

 Alain




 --

   Mike Heffner m...@librato.com
   Librato, Inc.






-- 

  Mike Heffner m...@librato.com
  Librato, Inc.


Re: Datastax Java Driver

2012-11-20 Thread Jérémy SEVELLEC
Great!


2012/11/20 michael.figui...@gmail.com michael.figui...@gmail.com

 The Apache Cassandra project has traditionally not focused on client side.
 Rather than modifying the scope of the project and jeopardizing the current
 driver ecosystem we've preferred to open source it this way. Not that this
 driver's license is Apache License 2 and it will remain so, making it easy
 to integrate it in most open source projects with no license issues.


 On Tue, Nov 20, 2012 at 1:19 AM, Timmy Turner timm.t...@gmail.com wrote:

 Why is this being released as a separate project, instead of being
 bundled up with Cassandra? Is it not a part of Cassandra?


 2012/11/19 John Sanda john.sa...@gmail.com

 Fantastic! As for the object mapping API, has there been any
 discussion/consideration of
 http://www.hibernate.org/subprojects/ogm.html?


 On Mon, Nov 19, 2012 at 1:50 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 Everyone,

 We've just open-sourced a new Java driver we have been working on here
 at
 DataStax. This driver is CQL3 only and is built to use the new binary
 protocol
 that will be introduced with Cassandra 1.2. It will thus only work with
 Cassandra 1.2 onwards. Currently, it means that testing it requires
 1.2.0-beta2. This is also alpha software at this point. You are welcome
 to try
 and play with it and we would very much welcome feedback, but be sure
 that
 break, it will. The driver is accessible at:
   http://github.com/datastax/java-driver

 Today we're open-sourcing the core part of this driver. This main goal
 of this
 core module is to handle connections to the Cassandra cluster with all
 the
 features that one would expect. The currently supported features are:
   - Asynchronous: the driver uses the new CQL binary protocol
 asynchronous
 capabilities.
   - Nodes discovery.
   - Configurable load balancing/routing.
   - Transparent fail-over.
   - C* tracing handling.
   - Convenient schema access.
   - Configurable retry policy.

 This core module provides a simple low-level API (that works directly
 with
 query strings). We plan to release a higher-level, thin object mapping
 API
 based on top of this core shortly.

 Please refer to the project README for more information.

 --
 The DataStax Team




 --

 - John






-- 
Jérémy


Re: Upgrade 1.1.2 - 1.1.6

2012-11-20 Thread Rob Coli
On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote:
 We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs replayed
 regardless of the drain.

Your experience and desire for different (expected) behavior is welcomed on :

https://issues.apache.org/jira/browse/CASSANDRA-4446

nodetool drain sometimes doesn't mark commitlog fully flushed

If every production operator who experiences this issue shares their
experience on this bug, perhaps the project will acknowledge and
address it.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Upgrade 1.1.2 - 1.1.6

2012-11-20 Thread Mike Heffner
On Tue, Nov 20, 2012 at 2:49 PM, Rob Coli rc...@palominodb.com wrote:

 On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote:
  We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs
 replayed
  regardless of the drain.

 Your experience and desire for different (expected) behavior is welcomed
 on :

 https://issues.apache.org/jira/browse/CASSANDRA-4446

 nodetool drain sometimes doesn't mark commitlog fully flushed

 If every production operator who experiences this issue shares their
 experience on this bug, perhaps the project will acknowledge and
 address it.


Well in this case I think our issue was that upgrading from nanotime-epoch
seconds, by definition, replays all commit logs. That's not due to any
specific problem with nodetool drain not marking commitlog's flushed, but a
safety to ensure data is not lost due to buggy nanotime implementations.

For us, it was that the upgrade instructions pre-1.1.5-1.1.6 didn't
mention that CL's should be removed if successfully drained. On the other
hand, we do not use counters so replaying them was merely a much longer
MTT-Return after restarting with 1.1.6.

Mike

-- 

  Mike Heffner m...@librato.com
  Librato, Inc.


Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
 My understanding of the compaction process was that since data files keep 
 continuously merging we should not have data files with very old last 
 modified timestamps 
It is perfectly OK to have very old SSTables. 

 But performing an upgradesstables did decrease the number of data files and 
 removed all the data files with the old timestamps. 
upgradetables re-writes every sstable to have the same contents in the newest 
format. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote:

 Hello Aaron,
 
 Thanks a lot for the reply. 
 
 Looks like the documentation is confusing. Here is the link I am referring 
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
  It does not disable compaction. 
 As per the above url,  After running a major compaction, automatic minor 
 compactions are no longer triggered, frequently requiring you to manually run 
 major compactions on a routine basis. ( Just before the heading Tuning 
 Column Family compression in the above link) 
 
 With respect to the replies below : 
 
 
  it creates one big file, which will not be compacted until there are (by 
  default) 3 other very big files. 
 This is for the minor compaction and major compaction should theoretically 
 result in one large file irrespective of the number of data files initially? 
 
 This is not something you have to worry about. Unless you are seeing 1,000's 
 of files using the default compaction.
 
 Well my worry has been because of the large amount of node movements we have 
 done in the ring. We started off with 6 nodes and increased the capacity to 
 12 with disproportionate increases every time which resulted in a lot of 
 clean of data folders except system, run repair and then a cleanup with an 
 aborted attempt in between.  
 
 There were some data.db files older by more than 2 weeks and were not 
 modified since then. My understanding of the compaction process was that 
 since data files keep continuously merging we should not have data files with 
 very old last modified timestamps (assuming there is a good amount of writes 
 to the table continuously) I did not have a for sure way of telling if 
 everything is alright with the compaction looking at the last modified 
 timestamps of all the data.db files.
 
 What are the compaction issues you are having ? 
 Your replies confirm that the timestamps should not be an issue to worry 
 about. So I guess I should not be calling them as issues any more.  But 
 performing an upgradesstables did decrease the number of data files and 
 removed all the data files with the old timestamps. 
 
 
 
 Regards,
 Ananth  
 
 
 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote:
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
 It does not disable compaction. 
 it creates one big file, which will not be compacted until there are (by 
 default) 3 other very big files. 
 
 
 1. Does a nodetool stop compaction also force the admin to manually run 
 major compaction ( I.e. disable automated major compactions ? ) 
 No. 
 Stop just stops the current compaction. 
 Nothing is disabled. 
 
 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
 Major compaction is not automatic. It is the manual nodetool compact command. 
 Automatic (minor) compaction is controlled by min_compaction_threshold and 
 max_compaction_threshold (for the default compaction strategy).
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  
 This is not something you have to worry about. 
 Unless you are seeing 1,000's of files using the default compaction. 
 
  For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables (
 No problems. 
 
 4. Does a upgradesstables fix any compaction issues ? 
 What are the compaction issues you are having ? 
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com 
 wrote:
 
 
 We have a cluster  running cassandra 1.1.4. On this cluster, 
 
 1. We had to move the nodes around a bit  when we were adding new nodes 
 (there was quite a good amount of node movement ) 
 
 2. We had to stop compactions during some of the days to save some disk  
 space on some of the nodes when they were running very very low on disk 
 spaces. (via nodetool stop COMPACTION)  
 
 
 As per datastax documentation, 

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Edward Capriolo
On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote:
 My understanding of the compaction process was that since data files keep
 continuously merging we should not have data files with very old last
 modified timestamps

 It is perfectly OK to have very old SSTables.

 But performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.

 upgradetables re-writes every sstable to have the same contents in the
 newest format.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:

 Hello Aaron,

 Thanks a lot for the reply.

 Looks like the documentation is confusing. Here is the link I am referring
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction


 It does not disable compaction.
 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)

 With respect to the replies below :


 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 This is for the minor compaction and major compaction should theoretically
 result in one large file irrespective of the number of data files initially?

This is not something you have to worry about. Unless you are seeing
 1,000's of files using the default compaction.

 Well my worry has been because of the large amount of node movements we have
 done in the ring. We started off with 6 nodes and increased the capacity to
 12 with disproportionate increases every time which resulted in a lot of
 clean of data folders except system, run repair and then a cleanup with an
 aborted attempt in between.

 There were some data.db files older by more than 2 weeks and were not
 modified since then. My understanding of the compaction process was that
 since data files keep continuously merging we should not have data files
 with very old last modified timestamps (assuming there is a good amount of
 writes to the table continuously) I did not have a for sure way of telling
 if everything is alright with the compaction looking at the last modified
 timestamps of all the data.db files.

What are the compaction issues you are having ?
 Your replies confirm that the timestamps should not be an issue to worry
 about. So I guess I should not be calling them as issues any more.  But
 performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.



 Regards,
 Ananth


 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
 wrote:

 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast for
 major compactions but not minor compactions )

 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.


 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 No.
 Stop just stops the current compaction.
 Nothing is disabled.

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )

 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.

  For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (

 No problems.

 4. Does a upgradesstables fix any compaction issues ?

 What are the compaction issues you are having ?


 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:


 We have a cluster  running cassandra 1.1.4. On this cluster,

 1. We had to move the nodes around a bit  when we were adding new nodes
 (there was quite a good amount of node movement )

 2. We had to stop compactions during some of the days to save some disk
 space on some of the nodes when they were running very very low on disk
 spaces. (via nodetool stop COMPACTION)


 As per datastax documentation, a manual 

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Ananth Gundabattula
Thanks a lot Aaron and Edward.

The mail thread clarifies some things for me.

For letting others know on this thread, running an upgradesstables did
decrease our bloom filter false positive ratios a lot. ( upgradesstables
was run not to upgrade from a casasndra version to a higher cassandra
version but because of all the node movement we had done to upgrade our
cluster in a staggered way with aborted attempts in between and I
understand that upgradesstables was not necessarily required for the high
bloom filter false positives rates we were seeing )


Regards,
Ananth


On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com
 wrote:
  My understanding of the compaction process was that since data files keep
  continuously merging we should not have data files with very old last
  modified timestamps
 
  It is perfectly OK to have very old SSTables.
 
  But performing an upgradesstables did decrease the number of data files
 and
  removed all the data files with the old timestamps.
 
  upgradetables re-writes every sstable to have the same contents in the
  newest format.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
  wrote:
 
  Hello Aaron,
 
  Thanks a lot for the reply.
 
  Looks like the documentation is confusing. Here is the link I am
 referring
  to:
 http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
  It does not disable compaction.
  As per the above url,  After running a major compaction, automatic minor
  compactions are no longer triggered, frequently requiring you to manually
  run major compactions on a routine basis. ( Just before the heading
 Tuning
  Column Family compression in the above link)
 
  With respect to the replies below :
 
 
  it creates one big file, which will not be compacted until there are (by
  default) 3 other very big files.
  This is for the minor compaction and major compaction should
 theoretically
  result in one large file irrespective of the number of data files
 initially?
 
 This is not something you have to worry about. Unless you are seeing
  1,000's of files using the default compaction.
 
  Well my worry has been because of the large amount of node movements we
 have
  done in the ring. We started off with 6 nodes and increased the capacity
 to
  12 with disproportionate increases every time which resulted in a lot of
  clean of data folders except system, run repair and then a cleanup with
 an
  aborted attempt in between.
 
  There were some data.db files older by more than 2 weeks and were not
  modified since then. My understanding of the compaction process was that
  since data files keep continuously merging we should not have data files
  with very old last modified timestamps (assuming there is a good amount
 of
  writes to the table continuously) I did not have a for sure way of
 telling
  if everything is alright with the compaction looking at the last modified
  timestamps of all the data.db files.
 
 What are the compaction issues you are having ?
  Your replies confirm that the timestamps should not be an issue to worry
  about. So I guess I should not be calling them as issues any more.  But
  performing an upgradesstables did decrease the number of data files and
  removed all the data files with the old timestamps.
 
 
 
  Regards,
  Ananth
 
 
  On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
  wrote:
 
  As per datastax documentation, a manual compaction forces the admin to
  start compaction manually and disables the automated compaction
 (atleast for
  major compactions but not minor compactions )
 
  It does not disable compaction.
  it creates one big file, which will not be compacted until there are (by
  default) 3 other very big files.
 
 
  1. Does a nodetool stop compaction also force the admin to manually run
  major compaction ( I.e. disable automated major compactions ? )
 
  No.
  Stop just stops the current compaction.
  Nothing is disabled.
 
  2. Can a node restart reset the automated major compaction if a node
 gets
  into a manual mode compaction for whatever reason ?
 
  Major compaction is not automatic. It is the manual nodetool compact
  command.
  Automatic (minor) compaction is controlled by min_compaction_threshold
 and
  max_compaction_threshold (for the default compaction strategy).
 
  3. What is the ideal  number of SSTables for a table in a keyspace ( I
  mean are there any indicators as to whether my compaction is alright or
 not
  ? )
 
  This is not something you have to worry about.
  Unless you are seeing 1,000's of files using the default compaction.
 
   For example, I have seen SSTables on the disk more than 10 days old
  wherein there were other SSTables belonging to the same table but much
  

Re: row cache re-fill very slow

2012-11-20 Thread aaron morton
 INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 
 451) completed loading (5175655 ms; 13259976 keys) row cache
So it was reading 2,562 rows per second during startup. I'd say that's not 
unreasonable performance for 13 million rows. It will get faster in 1.2, but 
for now just have the cache save less keys perhaps. 

 Would something like iterating over SSTables instead, and throwing rows at 
 the cache that need to be in there feasible ? 
During start up we do not read the -Data.db component of the SStable, only the 
-Index.db (and -Filter.db) component. Also the SSTables are opened in parallel. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/11/2012, at 10:39 AM, Andras Szerdahelyi 
andras.szerdahe...@ignitionone.com wrote:

 Aaron,
 
 What version are you on ? 
 
 
 1.1.5 
 
 Do you know how many rows were loaded ?
 
 INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 
 451) completed loading (5175655 ms; 13259976 keys) row cache
 
 In both cases I do not believe the cache is stored in token (or key) order. 
 
 Am i getting this right:  the row keys are read and rows are retrieved from 
 SSTables in the order their keys are in the cache file..
 Would something like iterating over SSTables instead, and throwing rows at 
 the cache that need to be in there feasible ? If the SSTables themselves are 
 written sequentially at compaction time , which is how i remember they are 
 written, SSTable-sized sequential reads with a filter ( bloom filter for the 
 row cache? :-) ) must be faster than reading from all across the column 
 family ( i have HDDs and about 1k SSTables )
 
 row_cache_keys_to_save in yaml may help you find a happy half way point. 
 
 
 If i can keep that high enough, with my data retention requirements, save for 
 the absolute first get on a row, i can operate entirely out of memory.
 
 thanks!
 Andras
 
 Andras Szerdahelyi
 Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
 M: +32 493 05 50 88 | Skype: sandrew84
 
 
 C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png
 
 
 On 19 Nov 2012, at 22:00, aaron morton aa...@thelastpickle.com
  wrote:
 
 i was just wondering if anyone else is experiencing very slow ( ~ 3.5 
 MB/sec ) re-fill of the row cache at start up.
 It was mentioned the other day.  
 
 What version are you on ? 
 Do you know how many rows were loaded ? When complete it will log a message 
 with the pattern 
 
 completed loading (%d ms; %d keys) row cache for %s.%s
 
 How is the saved row cache file processed?
 
 In Version 1.1, after the SSTables have been opened the keys in the saved 
 row cache are read one at a time and the whole row read into memory. This is 
 a single threaded operation. 
 
 In 1.2 reading the saved cache is still single threaded, but reading the 
 rows goes through the read thread pool so is in parallel.
 
 In both cases I do not believe the cache is stored in token (or key) order. 
 
 ( Admittedly whatever is going on is still much more preferable to starting 
 with a cold row cache )
 
 row_cache_keys_to_save in yaml may help you find a happy half way point. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/11/2012, at 3:17 AM, Andras Szerdahelyi 
 andras.szerdahe...@ignitionone.com wrote:
 
 Hey list,
 
 i was just wondering if anyone else is experiencing very slow ( ~ 3.5 
 MB/sec ) re-fill of the row cache at start up. We operate with a large row 
 cache ( 10-15GB currently ) and we already measure startup times in hours 
 :-)
 
 How is the saved row cache file processed? Are the cached row keys simply 
 iterated over and their respective rows read from SSTables - possibly 
 creating random reads with small enough sstable files, if the keys were not 
 stored in a manner optimised for a quick re-fill ? -  or is there a smarter 
 algorithm ( i.e. scan through one sstable at a time, filter rows that 
 should be in row cache )  at work and this operation is purely disk i/o 
 bound ?
 
 ( Admittedly whatever is going on is still much more preferable to starting 
 with a cold row cache )
 
 thanks!
 Andras
 
 
 
 Andras Szerdahelyi
 Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
 M: +32 493 05 50 88 | Skype: sandrew84
 
 
 C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png
 
 
 
 



Re: Invalid argument

2012-11-20 Thread aaron morton
 Thanks for the work around, setting disk_access_mode: standard worked.
hmmm, it's only a work around. 

If you can reproduce the fault could you report it on 
https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/11/2012, at 11:03 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi Aaron.
 
 Here is my java -version
 
 java version 1.6.0_35
 Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
 Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
 
 Thanks for the work around, setting disk_access_mode: standard worked.
 
 Alain
 
 
 2012/11/19 aaron morton aa...@thelastpickle.com
 Are you running a 32 bit JVM ? What is the full JVM version ? 
 
 As a work around you can try disabling memory mapped access set 
 disk_access_mode to standard. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/11/2012, at 6:27 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 I have backed up production sstables from one of my 3 production nodes 
 (RF=3) and I want to use them on my dev environment.(C* 1.1.6 on both 
 environments)
 
 My dev server is a 4 core, 4 GB RAM hardware runing on ubuntu.
 
 I have applied the production schema in my dev node and copied all sstable 
 in the appropriated folder and restart my node like I always do.
 
 But this time have had the following error (many times and only for ) :
 
  INFO [SSTableBatchOpen:4] 2012-11-19 17:52:52,980 SSTableReader.java (line 
 169) Opening 
 /var/lib/cassandra/data/cassa_teads/data_action/cassa_teads-data_action-hf-660
  (7015417424 bytes)
 ERROR [SSTableBatchOpen:3] 2012-11-19 17:53:17,259 
 AbstractCassandraDaemon.java (line 135) Exception in thread 
 Thread[SSTableBatchOpen:3,5,main]
 java.io.IOError: java.io.IOException: Invalid argument
 at 
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202)
 at 
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
 at 
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:242)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
 Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.io.IOException: Invalid argument
 at sun.nio.ch.FileChannelImpl.truncate0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(Unknown Source)
 at 
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:194)
 ... 11 more
 
 If I try with nodetool refresh I have the following error :
 
 Exception in thread main java.io.IOError: java.io.IOException: Invalid 
 argument
 at 
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202)
 at 
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:510)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:468)
 at 
 org.apache.cassandra.service.StorageService.loadNewSSTables(StorageService.java:3089)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
 at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
  

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
 upgradetables re-writes every sstable to have the same contents in the
 newest format.
Agree. 
 In the world of compaction, and excluding upgrades, have older sstables is 
expected.

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/11/2012, at 11:45 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote:
 My understanding of the compaction process was that since data files keep
 continuously merging we should not have data files with very old last
 modified timestamps
 
 It is perfectly OK to have very old SSTables.
 
 But performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.
 
 upgradetables re-writes every sstable to have the same contents in the
 newest format.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:
 
 Hello Aaron,
 
 Thanks a lot for the reply.
 
 Looks like the documentation is confusing. Here is the link I am referring
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
 It does not disable compaction.
 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)
 
 With respect to the replies below :
 
 
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 This is for the minor compaction and major compaction should theoretically
 result in one large file irrespective of the number of data files initially?
 
 This is not something you have to worry about. Unless you are seeing
 1,000's of files using the default compaction.
 
 Well my worry has been because of the large amount of node movements we have
 done in the ring. We started off with 6 nodes and increased the capacity to
 12 with disproportionate increases every time which resulted in a lot of
 clean of data folders except system, run repair and then a cleanup with an
 aborted attempt in between.
 
 There were some data.db files older by more than 2 weeks and were not
 modified since then. My understanding of the compaction process was that
 since data files keep continuously merging we should not have data files
 with very old last modified timestamps (assuming there is a good amount of
 writes to the table continuously) I did not have a for sure way of telling
 if everything is alright with the compaction looking at the last modified
 timestamps of all the data.db files.
 
 What are the compaction issues you are having ?
 Your replies confirm that the timestamps should not be an issue to worry
 about. So I guess I should not be calling them as issues any more.  But
 performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.
 
 
 
 Regards,
 Ananth
 
 
 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
 wrote:
 
 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast for
 major compactions but not minor compactions )
 
 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 
 
 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )
 
 No.
 Stop just stops the current compaction.
 Nothing is disabled.
 
 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?
 
 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )
 
 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.
 
 For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (
 
 No problems.
 
 4. Does a upgradesstables fix any compaction issues ?
 
 What are the compaction issues you are having ?
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com

Re: Invalid argument

2012-11-20 Thread Rob Coli
On Tue, Nov 20, 2012 at 2:03 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
] Thanks for the work around, setting disk_access_mode: standard worked.

Do you have working JNA, for reference?

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Upgrade 1.1.2 - 1.1.6

2012-11-20 Thread Tamar Fraenkel
Hi!
I had the same problem (over counting due to replay of commit log, which
ignored drain) after upgrading my cluster from 1.0.9 to 1.0.11.
I updated the Cassandra tickets mentioned in this thread.
Regards,
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Nov 20, 2012 at 11:03 PM, Mike Heffner m...@librato.com wrote:


 On Tue, Nov 20, 2012 at 2:49 PM, Rob Coli rc...@palominodb.com wrote:

 On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote:
  We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs
 replayed
  regardless of the drain.

 Your experience and desire for different (expected) behavior is welcomed
 on :

 https://issues.apache.org/jira/browse/CASSANDRA-4446

 nodetool drain sometimes doesn't mark commitlog fully flushed

 If every production operator who experiences this issue shares their
 experience on this bug, perhaps the project will acknowledge and
 address it.


 Well in this case I think our issue was that upgrading from
 nanotime-epoch seconds, by definition, replays all commit logs. That's not
 due to any specific problem with nodetool drain not marking commitlog's
 flushed, but a safety to ensure data is not lost due to buggy nanotime
 implementations.

 For us, it was that the upgrade instructions pre-1.1.5-1.1.6 didn't
 mention that CL's should be removed if successfully drained. On the other
 hand, we do not use counters so replaying them was merely a much longer
 MTT-Return after restarting with 1.1.6.

 Mike

 --

   Mike Heffner m...@librato.com
   Librato, Inc.



tokLogo.png

Re: Looking for a good Ruby client

2012-11-20 Thread Timmy Turner
Thanks Mat!

I thought you were going to expose the internals of CQL3 features like
(wide rows with) complex keys and collections to CQL2 clients (which is
something that should generally be possible, if Datastax' blog posts are
accurate, i.e. an actual description of how things were implemented and not
just a conceptual one).

I'm still negotiating with my project lead on what features will ultimately
be implemented, so I'm not sure whether CQL2/3 interoperability will
actually make it into the final 'product' .. but it isn't very high up on
the priority list, so it will most likely be implemented towards the end,
and thus I guess it'll also kind of depend on how much CQL2 support will be
provided by Cassandra itself when the time comes.


2012/11/20 Mat Brown m...@brewster.com

 Hi Timmy,

 I haven't done a lot of playing with CQL3 yet, mostly just reading the
 blog posts, so the following is subject to change : )

 Right now, the Cequel model layer has a skinny row model (which is
 designed to follow common patterns of Ruby ORMs) and a wide row model
 (which is designed to behave more or less like a Hash, the equivalent
 of Java's HashMap). The two don't integrate with each other in any
 meaningful way, but as far as I understand it, they do pretty much
 cover the data modeling possibilities in CQL2.

 The big idea I've got for the overhaul of Cequel for CQL3 is to allow
 building a rich, nested data model by integrating different flavors of
 CQL3 table, most notably multi-column primary keys, as well as
 collections. The core data types I have in mind are:

 1) Skinny row with simple primary key (e.g. blogs, with blog_id key)
 2) Skinny row with complex primary key (e.g. blog_posts, with
 (blog_id, post_id) key)
 3) Wide row with simple primary key (e.g. blog_languages -- kind of a
 weak example but i can't think of anything better for a blog : )
 4) Wide row with complex primary key (e.g. blog_post_tags)

 My goal is to make it easy to model one-one relationships via a shared
 primary key, and one-many via a shared prefix of the primary key. So,
 for instance, blogs and blog_languages rows would be one-one (both
 with a blog_id primary key) and blogs and blog_posts would be one-many
 (sharing the blog_id prefix in the primary key).

 From what I've read, it seems fairly clear that the actual CQL used to
 interact with #1 will be the same for CQL2 column families and CQL3
 tables, so no explicit backward compatibility would be needed. #2 and
 #4 are, of course, CQL3-only, so backward compatibility isn't an issue
 there either. What I'm not entirely clear on is #3 -- this is
 straightforward in CQL2, and presumably a CQL3 table with compact
 storage would behave in the same way. However, my understanding so far
 is that a non-compact CQL3 table would treat this structure
 differently, in that both the key and value of the map would
 correspond to columns in a CQL3 table. It may make more sense to just
 target compact storage tables with this data structure, but I'm going
 to need to play around with it more to figure that out. Otherwise,
 Cequel will need to provide two flavors of that structure.

 There's also some tension between CQL3 collections and just using
 traditional wide-row structures to achieve the same thing. For
 instance, blog_tags could also just be a tags collection in the blogs
 table. My plan at this point is to offer both options, since each has
 its advantages (collections don't require the creation of a separate
 table; but a separate table gives you access to slices of the
 collection).

 Anyway, that's probably a lot more of an answer than you needed, but
 hopefully the context helps. Definitely interested to hear about the
 direction you take your client in as well.

 Finally, regarding a blog, we've got one set up, but it's not live
 yet. I'll ping you with a link when it is; I'll certainly be posting
 on the development of the next Cequel release.

 Cheers,
 Mat

 On Tue, Nov 20, 2012 at 9:23 AM, Timmy Turner timm.t...@gmail.com wrote:
  @Mat Brown:
 
  (while still retaining compatibility with CQL2 structures).
 
  Do you mean by exceeding what Cassandra itself provides in terms of
 CQL2/3
  interoperability?
 
  I'm looking into something similar currently (however in Java not in
 Ruby)
  and would be interested in your experiences, if you follow through with
 the
  plan. Do you have a blog?
 
 
  Thanks!
 
 
  2012/11/20 Alain RODRIGUEZ arodr...@gmail.com
 
  @Mat
 
  Well I guess you could add your Ruby client to this list since there is
  not a lot of them yet.
 
  http://wiki.apache.org/cassandra/ClientOptions
 
  Alain
 
 
  2012/11/20 Mat Brown m...@brewster.com
 
  As the author of Cequel, I can assure you it is excellent ; )
 
  We use it in production at Brewster and it is quite stable. If you try
  it out and find any bugs, we'll fix 'em  quickly.
 
  I'm planning a big overhaul of the model layer over the holidays to
  expose all the
  new data modeling goodness