RE: Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-30 Thread Ramon Rockx
Hi,

The bug report can be found at:
https://issues.apache.org/jira/browse/CASSANDRA-1790

Regards,
Ramon



From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: maandag 29 november 2010 16:09
To: user
Subject: Re: Booting Cassandra v0.7.0 on Windows: rename failed



Please report a bug at https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Nov 29, 2010 at 2:49 AM, Ramon Rockx r.ro...@asknow.nl wrote:
 Hi,

 Recently I downloaded Cassandra v0.7.0 rc1. When I try to run
cassandra
 it ends with the following logging:

  INFO 09:17:30,044 Enqueuing flush of
 memtable-locationi...@839514767(643 bytes, 12 operations)
  INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes,
12
 operations)
 ERROR 09:17:30,233 Fatal exception in thread
 Thread[FlushWriter:1,5,main]
 java.io.IOError: java.io.IOException: rename failed of
 d:\cassandra\data\system\LocationInfo-e-1-Data.db
  at

org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
 214)
  at

org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
 Writer.java:184)
  at

org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
 Writer.java:167)
  at

org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
  at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
  at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
  at

org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  at

java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
 r.java:886)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
 va:908)
  at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: rename failed of
 d:\cassandra\data\system\LocationInfo-e-1-Data.db
  at

org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
 a:359)
  at

org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
 210)
  ... 12 more

 Operating system is Windows 7. Tried it also on Windows 2003 server.
 I only modified a few (necessary) path settings in cassandra.yaml:

 commitlog_directory: d:/cassandra/commitlog
 data_file_directories:
 - d:/cassandra/data
 saved_caches_directory: d:/cassandra/saved_caches

 Does anybody know what I'm doing wrong?

 Regards,
 Ramon




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com




Geen virus gevonden in dit bericht.
Gecontroleerd door AVG - www.avg.com
Versie: 10.0.1170 / Virusdatabase: 426/3286 - datum van uitgifte:
11/28/10



Re: Introduction to Cassandra

2010-11-30 Thread aaron morton
Jim, Jonathan thanks for the feedback. 

I was trying something different and rattled through the whole deck in about 
35-40 minutes. 

I'll be doing the talk again at the Wellington Python Users Group this Thursday 
http://nzpug.org/MeetingsWellington

Aaron

On 30 Nov 2010, at 20:18, Jim Morrison wrote:

 Really great introduction, thanks Aaron.Bookmarked for the team. 
 
 J. 
 
 Sent from my iPhone
 
 On 29 Nov 2010, at 21:11, Aaron Morton aa...@thelastpickle.com wrote:
 
 I did a talk last week at the Wellington Rails User Group as a basic 
 introduction to Cassandra. The slides are here 
 http://www.slideshare.net/aaronmorton/well-railedcassandra24112010-5901169 
 if anyone is interested. 
 
 Cheers
 Aaron
 



Re: Achieving isolation on single row modifications with batch_mutate

2010-11-30 Thread E S
I'm chunking up a larger blob.  Basically the size of each row can vary 
(averages around 500K - 1MB), with some outliers in the 50 MB range.   However, 
when I do an update, I can usually just read/update a portion of that blob.  A 
lot of my read operations can also work on a smaller chunk.  The number of 
columns is going to depend on the size of the blob itself.  I'm also 
considering 
using supercolumns to have higher save granularity.

My biggest problem is that I will have to update these rows a lot (several 
times 
a day) and often very quickly (process 15 thousand in 2-3 minutes).  While I 
think I could probably scale up with a lot of hardware to meet that load, it 
seems like I'm doing much much more work than I need to (processing 15 GB of 
data in 2-3 minutes as opposes to 100 MB).  I also worry about handling our 
future data size needs.

I can split the blob up without a lot of extra complexity but am worried about 
how to have readers read a non-corrupted version of the object, since sometimes 
I'll have to update multiple chunks as one unit.





From: Tyler Hobbs ty...@riptano.com
To: user@cassandra.apache.org
Sent: Tue, November 30, 2010 12:57:07 AM
Subject: Re: Achieving isolation on single row modifications with batch_mutate

In this case, it sounds like you should combine columns A and B if you
are writing them both at the same time, reading them both at the same
time, and need them to be consistent.

Obviously, you're probably dealing with more than two columns here, but
there's generally not any value in splitting something into multiple columns
if you're always writing and reading all of them at the same time.

Or are you talking about chunking huge blobs across a row?

- Tyler


On Sat, Nov 27, 2010 at 10:12 AM, E S tr1skl...@yahoo.com wrote:

I'm trying to figure out the best way to achieve single row modification
isolation for readers.

As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both 
rows,
I don't care if the user sees the write operations completed on 1 and not on 2
for a short time period (seconds).  I also don't care if when reading row 1 the
user gets the new value, and then on a re-read gets the old value (within a few
seconds).  Because of this, I have been planning on using a consistency level 
of
one.

However, if I modify both columns A,B on a single row, I need both changes on
the row to be visible/invisible atomically.  It doesn't matter if they both
become visible and then both invisible as the data propagates across nodes, but
a half-completed state on an initial read will basically be returning corrupt
data given my apps consistency requirements.  My understanding from the FAQ 
that
this single row multicolumn change provides no read isolation, so I will have
this problem.  Is this correct?  If so:

Question 1:  Is there a way to get this type of isolation without using a
distributed locking mechanism like cages?

Question 2:  Are there any plans to implement this type of isolation within
Cassandra?

Question 3:  If I went with a distributed locking mechanism, what consistency
level would I need to use with Cassandra?  Could I still get away with a
consistency level of one?  It seems that if the initial write is done in a
non-isolated way, but if cross-node row synchronizations are done all or
nothing, I could still use one.

Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?

Thanks for any help with this!








  

Re: Updating Cascal

2010-11-30 Thread Michael Fortin
Hi Tyler,

Thanks for the response.  I decided to give up on it, and start my own Scala 
based api modeled on Cascal since it's no longer supported. 

_M!ke

On Nov 30, 2010, at 1:06 AM, Tyler Hobbs wrote:

 Are you sure you're using the same key for batch_mutate() and get_slice()?  
 They appear different in the logs.
 
 - Tyler
 
 On Thu, Nov 25, 2010 at 10:14 AM, Michael Fortin mi...@m410.us wrote:
 Hello,
 I forked Cascal  (Scala based client for cassandra) and I'm attempting to 
 update it to cassandra 0.7.  I have it partially working, but I'm getting 
 stuck on a few areas.
 
 I have most of the unit tests working from the original code, but I'm having 
 an issue with batch_mutate(keyToFamilyMutations, consistency) .  Does the log 
 output mean anything?  I can't figure out why the columns are not getting 
 inserted.  If I change th code from a batch_mutate to an insert(family, 
 parent, column, consistency) it works.
 
 ### keyToFamilyMutations: {java.nio.HeapByteBuffer[pos=0 lim=16 
 cap=16]={Standard=[Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43
  6F 6C 75 6D 6E 2D 61 2D 31, value:56 61 6C 75 65 2D 31, 
 timestamp:1290662894466035))), 
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F 
 6C 75 6D 6E 2D 61 2D 33, value:56 61 6C 75 65 2D 33, 
 timestamp:1290662894467942))), 
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F 
 6C 75 6D 6E 2D 61 2D 32, value:56 61 6C 75 65 2D 32, 
 timestamp:1290662894467915)))]}}
 DEBUG 2010-11-25 00:28:14,534 [org.apache.cassandra.thrift.CassandraServer 
 pool-1-thread-2] batch_mutate
 DEBUG 2010-11-25 00:28:14,583 [org.apache.cassandra.service.StorageProxy 
 pool-1-thread-2] insert writing local RowMutation(keyspace='Test', 
 key='ccfd5520f85411df858a001c4209', modifications=[Standard])
 
 DEBUG 2010-11-25 00:28:14,599 [org.apache.cassandra.thrift.CassandraServer 
 pool-1-thread-2] get_slice
 DEBUG 2010-11-25 00:28:14,605 [org.apache.cassandra.service.StorageProxy 
 pool-1-thread-2] weakread reading SliceFromReadCommand(table='Test', 
 key='5374616e64617264', column_parent='QueryPath(columnFamilyName='Standard', 
 superColumnName='null', columnName='null')', start='', finish='', 
 reversed=false, count=2147483647) locally
 DEBUG 2010-11-25 00:28:14,608 [org.apache.cassandra.service.StorageProxy 
 ReadStage:2] weakreadlocal reading SliceFromReadCommand(table='Test', 
 key='5374616e64617264', column_parent='QueryPath(columnFamilyName='Standard', 
 superColumnName='null', columnName='null')', start='', finish='', 
 reversed=false, count=2147483647)
 ### get_slice: []
 
 
 The code looks like:
  println(keyToFamilyMutations: %s.format(keyToFamilyMutations))
  client.batch_mutate(keyToFamilyMutations, consistency)
  …
  client.client.get_slice(…)
 
 keyspaces:
- name: Test
  replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy
  replication_factor: 1
  column_families:
- {name: Standard, compare_with: BytesType}
 
 
 
 Thanks,
 Mike
 



Re: Achieving isolation on single row modifications with batch_mutate

2010-11-30 Thread Jonathan Ellis
On Sat, Nov 27, 2010 at 10:12 AM, E S tr1skl...@yahoo.com wrote:
 I'm trying to figure out the best way to achieve single row modification
 isolation for readers.

I have a lot of No's for you. :)

 As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both 
 rows,
 I don't care if the user sees the write operations completed on 1 and not on 2
 for a short time period (seconds).  I also don't care if when reading row 1 
 the
 user gets the new value, and then on a re-read gets the old value (within a 
 few
 seconds).  Because of this, I have been planning on using a consistency level 
 of
 one.

 However, if I modify both columns A,B on a single row, I need both changes on
 the row to be visible/invisible atomically.  It doesn't matter if they both
 become visible and then both invisible as the data propagates across nodes, 
 but
 a half-completed state on an initial read will basically be returning corrupt
 data given my apps consistency requirements.  My understanding from the FAQ 
 that
 this single row multicolumn change provides no read isolation, so I will have
 this problem.  Is this correct?  If so:

 Question 1:  Is there a way to get this type of isolation without using a
 distributed locking mechanism like cages?

No.

 Question 2:  Are there any plans to implement this type of isolation within
 Cassandra?

No.

 Question 3:  If I went with a distributed locking mechanism, what consistency
 level would I need to use with Cassandra?  Could I still get away with a
 consistency level of one?

Maybe.  If you want to guarantee that you see the most recent write,
then ONE will not be high enough. But if all you care about is seeing
all of the update or none of it, then ONE + locking will be fine.

 Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?

No.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Updating Cascal

2010-11-30 Thread Jonathan Ellis
Did you look at Scromium?

On Tue, Nov 30, 2010 at 8:27 AM, Michael Fortin mi...@m410.us wrote:
 Hi Tyler,
 Thanks for the response.  I decided to give up on it, and start my own Scala
 based api modeled on Cascal since it's no longer supported.
 _M!ke
 On Nov 30, 2010, at 1:06 AM, Tyler Hobbs wrote:

 Are you sure you're using the same key for batch_mutate() and get_slice()?
 They appear different in the logs.

 - Tyler

 On Thu, Nov 25, 2010 at 10:14 AM, Michael Fortin mi...@m410.us wrote:

 Hello,
 I forked Cascal  (Scala based client for cassandra) and I'm attempting to
 update it to cassandra 0.7.  I have it partially working, but I'm getting
 stuck on a few areas.

 I have most of the unit tests working from the original code, but I'm
 having an issue with batch_mutate(keyToFamilyMutations, consistency) .  Does
 the log output mean anything?  I can't figure out why the columns are not
 getting inserted.  If I change th code from a batch_mutate to an
 insert(family, parent, column, consistency) it works.

 ### keyToFamilyMutations: {java.nio.HeapByteBuffer[pos=0 lim=16
 cap=16]={Standard=[Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43
 6F 6C 75 6D 6E 2D 61 2D 31, value:56 61 6C 75 65 2D 31,
 timestamp:1290662894466035))),
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
 6C 75 6D 6E 2D 61 2D 33, value:56 61 6C 75 65 2D 33,
 timestamp:1290662894467942))),
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
 6C 75 6D 6E 2D 61 2D 32, value:56 61 6C 75 65 2D 32,
 timestamp:1290662894467915)))]}}
 DEBUG 2010-11-25 00:28:14,534 [org.apache.cassandra.thrift.CassandraServer
 pool-1-thread-2] batch_mutate
 DEBUG 2010-11-25 00:28:14,583 [org.apache.cassandra.service.StorageProxy
 pool-1-thread-2] insert writing local RowMutation(keyspace='Test',
 key='ccfd5520f85411df858a001c4209', modifications=[Standard])

 DEBUG 2010-11-25 00:28:14,599 [org.apache.cassandra.thrift.CassandraServer
 pool-1-thread-2] get_slice
 DEBUG 2010-11-25 00:28:14,605 [org.apache.cassandra.service.StorageProxy
 pool-1-thread-2] weakread reading SliceFromReadCommand(table='Test',
 key='5374616e64617264',
 column_parent='QueryPath(columnFamilyName='Standard',
 superColumnName='null', columnName='null')', start='', finish='',
 reversed=false, count=2147483647) locally
 DEBUG 2010-11-25 00:28:14,608 [org.apache.cassandra.service.StorageProxy
 ReadStage:2] weakreadlocal reading SliceFromReadCommand(table='Test',
 key='5374616e64617264',
 column_parent='QueryPath(columnFamilyName='Standard',
 superColumnName='null', columnName='null')', start='', finish='',
 reversed=false, count=2147483647)
 ### get_slice: []


 The code looks like:
      println(keyToFamilyMutations: %s.format(keyToFamilyMutations))
      client.batch_mutate(keyToFamilyMutations, consistency)
      …
      client.client.get_slice(…)

 keyspaces:
    - name: Test
      replica_placement_strategy:
 org.apache.cassandra.locator.SimpleStrategy
      replication_factor: 1
      column_families:
        - {name: Standard, compare_with: BytesType}



 Thanks,
 Mike





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Updating Cascal

2010-11-30 Thread Michael Fortin
Your referring to this: https://github.com/cliffmoon/scromium right?  Thanks 
for the tip, I'll give it a try.

_Mike


On Nov 30, 2010, at 9:51 AM, Jonathan Ellis wrote:

 Did you look at Scromium?
 
 On Tue, Nov 30, 2010 at 8:27 AM, Michael Fortin mi...@m410.us wrote:
 Hi Tyler,
 Thanks for the response.  I decided to give up on it, and start my own Scala
 based api modeled on Cascal since it's no longer supported.
 _M!ke
 On Nov 30, 2010, at 1:06 AM, Tyler Hobbs wrote:
 
 Are you sure you're using the same key for batch_mutate() and get_slice()?
 They appear different in the logs.
 
 - Tyler
 
 On Thu, Nov 25, 2010 at 10:14 AM, Michael Fortin mi...@m410.us wrote:
 
 Hello,
 I forked Cascal  (Scala based client for cassandra) and I'm attempting to
 update it to cassandra 0.7.  I have it partially working, but I'm getting
 stuck on a few areas.
 
 I have most of the unit tests working from the original code, but I'm
 having an issue with batch_mutate(keyToFamilyMutations, consistency) .  Does
 the log output mean anything?  I can't figure out why the columns are not
 getting inserted.  If I change th code from a batch_mutate to an
 insert(family, parent, column, consistency) it works.
 
 ### keyToFamilyMutations: {java.nio.HeapByteBuffer[pos=0 lim=16
 cap=16]={Standard=[Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43
 6F 6C 75 6D 6E 2D 61 2D 31, value:56 61 6C 75 65 2D 31,
 timestamp:1290662894466035))),
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
 6C 75 6D 6E 2D 61 2D 33, value:56 61 6C 75 65 2D 33,
 timestamp:1290662894467942))),
 Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
 6C 75 6D 6E 2D 61 2D 32, value:56 61 6C 75 65 2D 32,
 timestamp:1290662894467915)))]}}
 DEBUG 2010-11-25 00:28:14,534 [org.apache.cassandra.thrift.CassandraServer
 pool-1-thread-2] batch_mutate
 DEBUG 2010-11-25 00:28:14,583 [org.apache.cassandra.service.StorageProxy
 pool-1-thread-2] insert writing local RowMutation(keyspace='Test',
 key='ccfd5520f85411df858a001c4209', modifications=[Standard])
 
 DEBUG 2010-11-25 00:28:14,599 [org.apache.cassandra.thrift.CassandraServer
 pool-1-thread-2] get_slice
 DEBUG 2010-11-25 00:28:14,605 [org.apache.cassandra.service.StorageProxy
 pool-1-thread-2] weakread reading SliceFromReadCommand(table='Test',
 key='5374616e64617264',
 column_parent='QueryPath(columnFamilyName='Standard',
 superColumnName='null', columnName='null')', start='', finish='',
 reversed=false, count=2147483647) locally
 DEBUG 2010-11-25 00:28:14,608 [org.apache.cassandra.service.StorageProxy
 ReadStage:2] weakreadlocal reading SliceFromReadCommand(table='Test',
 key='5374616e64617264',
 column_parent='QueryPath(columnFamilyName='Standard',
 superColumnName='null', columnName='null')', start='', finish='',
 reversed=false, count=2147483647)
 ### get_slice: []
 
 
 The code looks like:
  println(keyToFamilyMutations: %s.format(keyToFamilyMutations))
  client.batch_mutate(keyToFamilyMutations, consistency)
  …
  client.client.get_slice(…)
 
 keyspaces:
- name: Test
  replica_placement_strategy:
 org.apache.cassandra.locator.SimpleStrategy
  replication_factor: 1
  column_families:
- {name: Standard, compare_with: BytesType}
 
 
 
 Thanks,
 Mike
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Updating Cascal

2010-11-30 Thread Daniel Lundin
I'd highly recommend looking at Hector (v2) as well. It's very nice.
I'm using it from Scala without any issues.

Rather than duplicating the effort of scromium, cascal, scalandra, and
not to mention Hector itself, perhaps it'd worthwhile taking a stab at
a Scala interface wrapping Hector?

Connection/failover strategies, test helpers and client metrics aren't
THAT much fun reinventing.

/d

 On Nov 30, 2010, at 9:51 AM, Jonathan Ellis wrote:

 Did you look at Scromium?

 On Tue, Nov 30, 2010 at 8:27 AM, Michael Fortin mi...@m410.us wrote:
 Hi Tyler,
 Thanks for the response.  I decided to give up on it, and start my own Scala
 based api modeled on Cascal since it's no longer supported.


Re: Achieving isolation on single row modifications with batch_mutate

2010-11-30 Thread Ed Anuff
It's hard to tell without knowing the the nature of the data you're writing,
but you might want to think about whether you can embed any sort of version
number and/or checksum into the column names of the chunk columns.  That
way, you could very easily determine that the data you wanted to retrieve
was not yet available for reading.  Are you able do your partial blob
updates on an entire chunk at a time or do you need to read the blob chunk,
modify a portion of it, and then write it back?  If it's the former, then it
might be possible for this to be accomplished without a locking solution.

Ed

On Sat, Nov 27, 2010 at 8:12 AM, E S tr1skl...@yahoo.com wrote:

 I'm trying to figure out the best way to achieve single row modification
 isolation for readers.

 As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both
 rows,
 I don't care if the user sees the write operations completed on 1 and not
 on 2
 for a short time period (seconds).  I also don't care if when reading row 1
 the
 user gets the new value, and then on a re-read gets the old value (within a
 few
 seconds).  Because of this, I have been planning on using a consistency
 level of
 one.

 However, if I modify both columns A,B on a single row, I need both changes
 on
 the row to be visible/invisible atomically.  It doesn't matter if they both
 become visible and then both invisible as the data propagates across nodes,
 but
 a half-completed state on an initial read will basically be returning
 corrupt
 data given my apps consistency requirements.  My understanding from the FAQ
 that
 this single row multicolumn change provides no read isolation, so I will
 have
 this problem.  Is this correct?  If so:

 Question 1:  Is there a way to get this type of isolation without using a
 distributed locking mechanism like cages?

 Question 2:  Are there any plans to implement this type of isolation within
 Cassandra?

 Question 3:  If I went with a distributed locking mechanism, what
 consistency
 level would I need to use with Cassandra?  Could I still get away with a
 consistency level of one?  It seems that if the initial write is done in a
 non-isolated way, but if cross-node row synchronizations are done all or
 nothing, I could still use one.

 Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?

 Thanks for any help with this!







Re: get_count - cassandra 0.7.x predicate limit bug?

2010-11-30 Thread Edward Capriolo
On Tue, Nov 30, 2010 at 1:00 AM, Tyler Hobbs ty...@riptano.com wrote:
 What error are you getting?

 Remember, get_count() is still just about as much work for cassandra as
 getting the whole row; the only advantage is it doesn't have to send the
 whole row back to the client.

 If you're counting 3+ million columns frequently, it's time to take a look
 at counters.

 - Tyler

 On Fri, Nov 26, 2010 at 10:33 AM, Marcin mar...@33concept.com wrote:

 Hi guys,

 I have a key with 3million+ columns but when I am trying to run get_count
 on it its getting me error if setting limit more than 46000+ any ideas?

 In previous API there was no predicate at all so it was simply counting
 number of columns now its not so simple any more.

 Please let me know if that is a bug or I do something wrong.


 cheers,
 /Marcin



+1 Tyler. The problem is you can increase the clients socket timeout
as high as you like if socketTimeout  rpcTimeout you should see
SocketTimeoutExceptions if socketTimeout = rcpTimeout you start
seeing Cassandra TimedOutExceptions. Raising the RPC Timeout is done
on the server. In any case you may have to range_slice to get through
a row this big and count. Also in my experience rows this large do not
work well. They are particularly dangerous when combined with RowCache
as bringing them into to memory and evicting them is both disk and
memory intensive.


JVM OOM on node startup

2010-11-30 Thread Brayton Thompson
Hello again.
We have 3 nodes and were testing what happens when a node goes down. 
There is roughly 10gb of data on each node. The node we simulated dieing was 
working just fine under the load. Then we killed it. The ring performed 
admirably, But upon restarting the node it dies every time of JVM OOM errors.  
I have forced a JVM heap size of 1024mb in the startup file. (did this because 
adaptive heap size was causing oom errors with normal usage.) The machines are 
2 core 4gb ram vm's.

I've read the Riptano troubleshooting guide... 
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
 But im not sure if these apply in this case since it is only dieing on startup.

Here is a link to the startup logs as it dies.
http://pastebin.com/BEXeVvCX

Thank you for any help you can provide.

Re: JVM OOM on node startup

2010-11-30 Thread Aaron Morton
Looks like it's trying to load your row cache and running out of memory, probably because you reduced the memory. The cassandra-env.sh script would have been giving it 2GB.1Gb heap is probably going to be to small.Was this the same error you were getting before you reduced the memory ?Try deleting the caches, the path is specified by the saved_caches_directory setting in cassandra.yaml.Also what version are you using ? The errorCaused by: javax.management.AttributeNotFoundException: No such attribute: ActiveCount reminds me of a problem in beta 1.Hope that helps.AaronOn 01 Dec, 2010,at 09:28 AM, Brayton Thompson thomp...@grnoc.iu.edu wrote:Hello again.
	We have 3 nodes and were testing what happens when a node goes down. There is roughly 10gb of data on each node. The node we "simulated" dieing was working just fine under the load. Then we killed it. The ring performed admirably, But upon restarting the node it dies every time of JVM OOM errors.  I have forced a JVM heap size of 1024mb in the startup file. (did this because adaptive heap size was causing oom errors with normal usage.) The machines are 2 core 4gb ram vm's.

I've read the Riptano troubleshooting guide... http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors But im not sure if these apply in this case since it is only dieing on startup.

Here is a link to the startup logs as it dies.
http://pastebin.com/BEXeVvCX

Thank you for any help you can provide.

Re: JVM OOM on node startup

2010-11-30 Thread Jonathan Ellis
If you're getting OOM with adaptive heap size of  1GB, reducing it to
1GB is not going to make things better. :)

On Tue, Nov 30, 2010 at 2:28 PM, Brayton Thompson thomp...@grnoc.iu.edu wrote:
 Hello again.
        We have 3 nodes and were testing what happens when a node goes down. 
 There is roughly 10gb of data on each node. The node we simulated dieing 
 was working just fine under the load. Then we killed it. The ring performed 
 admirably, But upon restarting the node it dies every time of JVM OOM errors. 
  I have forced a JVM heap size of 1024mb in the startup file. (did this 
 because adaptive heap size was causing oom errors with normal usage.) The 
 machines are 2 core 4gb ram vm's.

 I've read the Riptano troubleshooting guide... 
 http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
  But im not sure if these apply in this case since it is only dieing on 
 startup.

 Here is a link to the startup logs as it dies.
 http://pastebin.com/BEXeVvCX

 Thank you for any help you can provide.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Achieving isolation on single row modifications with batch_mutate

2010-11-30 Thread E S
I'm a little confused about #3.  Hopefully this clarifying question won't turn 
the one maybe into a no :).

I'm fine not reading the latest data, as long as on each individual read I see 
all or none of the operations that occurred for a single one row batch_mutate.

My concern is do I have to lock the reads until they have propagated to all 
nodes.  If I do a batch_mutate with a consistency of ONE onto one row, during 
the write operation to the one node a reader can see partial changes.  Once the 
batch mutate completes, the change has not been propagated to the other nodes. 
 On a per row basis, are the changes to other nodes pushed in an isolated 
manner?  If not, it seems like I would have to write with a consistency of ALL 
and lock around that.




- Original Message 
From: Jonathan Ellis jbel...@gmail.com
To: user user@cassandra.apache.org
Sent: Tue, November 30, 2010 9:50:51 AM
Subject: Re: Achieving isolation on single row modifications with batch_mutate

On Sat, Nov 27, 2010 at 10:12 AM, E S tr1skl...@yahoo.com wrote:
 I'm trying to figure out the best way to achieve single row modification
 isolation for readers.

I have a lot of No's for you. :)

 As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both 
rows,
 I don't care if the user sees the write operations completed on 1 and not on 2
 for a short time period (seconds).  I also don't care if when reading row 1 
the
 user gets the new value, and then on a re-read gets the old value (within a 
few
 seconds).  Because of this, I have been planning on using a consistency level 
of
 one.

 However, if I modify both columns A,B on a single row, I need both changes on
 the row to be visible/invisible atomically.  It doesn't matter if they both
 become visible and then both invisible as the data propagates across nodes, 
but
 a half-completed state on an initial read will basically be returning corrupt
 data given my apps consistency requirements.  My understanding from the FAQ 
that
 this single row multicolumn change provides no read isolation, so I will have
 this problem.  Is this correct?  If so:

 Question 1:  Is there a way to get this type of isolation without using a
 distributed locking mechanism like cages?

No.

 Question 2:  Are there any plans to implement this type of isolation within
 Cassandra?

No.

 Question 3:  If I went with a distributed locking mechanism, what consistency
 level would I need to use with Cassandra?  Could I still get away with a
 consistency level of one?

Maybe.  If you want to guarantee that you see the most recent write,
then ONE will not be high enough. But if all you care about is seeing
all of the update or none of it, then ONE + locking will be fine.

 Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?

No.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com



  


pid file is not created in Windows Environment....

2010-11-30 Thread rambabu pakala
Hi,
 
Can you Please help me, why the pid file is not created in windows environment 
when I try with C:\apache-cassandra-0.6.6\bincassandra.bat -p c.pid ?
 
Is there a better way to shutdown the cassandra server instead of kill pid?
 
Thanks,
-Ram.


  

C++ client for Cassandra

2010-11-30 Thread Narendra Sharma
Are there any C++ clients out there similar to Hector (in terms of features)
for Cassandra? I am looking for C++ Client for Cassandra 0.7.

Thanks,
Naren


Re: C++ client for Cassandra

2010-11-30 Thread sharanabasava raddi
Thrift is there..

On Wed, Dec 1, 2010 at 11:43 AM, Narendra Sharma
narendra.sha...@gmail.comwrote:

 Are there any C++ clients out there similar to Hector (in terms of
 features) for Cassandra? I am looking for C++ Client for Cassandra 0.7.

 Thanks,
 Naren





When to call the major compaction ?

2010-11-30 Thread Ying Tang
Every time cassandra creates a new sstable , it will call the
CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
beyond  MinimumCompactionThreshold  , the minor compaction will be called.
And there is also a method named CompactionManager.submitMajor , and the
call relationship is :

NodeCmd --  NodeProbe --StorageService.forceTableCompaction --
Table.forceCompaction --CompactionManager.performMajor --
CompactionManager.submitMajor

ColumnFamilyStore.forceMajorCompaction -- CompactionManager.performMajor
-- CompactionManager.submitMajor


HintedHandOffManager
 -- CompactionManager.submitMajor

So i have 3 questions:
1. Once a new sstable has been created ,
CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
maybe called .
But when will the majorCompaction be called ? Just the NodeCmd ?
2. Which jobs will minorCompaction and majorCompaction do ?
Will minorCompaction delete the data that have been marked as deleted ?
And how about the major compaction ?
3. When gc be called ? Every time compaction been called?



-- 
Best regards,

Ivy Tang