hector - cassandra versions compatibility

2012-11-18 Thread Tamar Fraenkel
Hi!
I posted this question on hector users list but no one answered, so I am
trying here as well.

I have production cluster running Cassandra 1.0.8 and a test cluster with
Cassandra 1.1.6.
In my Java app I do not user maven, but rather have my lib directory with
the jar files I use.

When I ran my client code, currently using
cassandra-all-1.0.8.jar
cassandra-clientutil-1.0.8.jar
cassandra-thrift-1.0.9.jar
hector-core-1.0-5.jar
*it worked fine with both Cassandra 1.0.8 and 1.1.6.*

When I changed only hector to be hector-core-1.1-2.jar, *it also worked
fine with both Cassandra 1.0.8 and 1.1.6.

*
When I switched to
cassandra-all-1.1.5.jar
cassandra-clientutil-1.1.5.jar
cassandra-thrift-1.1.5.jar
hector-core-1.1-2.jar
*it didn't work, WITH EITHER Cassandra version...*

I had exceptions below.
Anyone can help or have an idea?

Thanks,
Tamar

java.lang.IncompatibleClassChangeError:
org/apache/cassandra/thrift/Cassandra$Client
at
me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:88)
at
me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:97)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:251)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:858)
at
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
at
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101)
at
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)



*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

Re: Datatype Conversion in CQL-Client?

2012-11-18 Thread Brian O'Neill

If you are talking about the CQL-client that comes with Cassandra (cqlsh), it 
is actually written in Python:
https://github.com/apache/cassandra/blob/trunk/bin/cqlsh

For information on datatypes (and conversion) take a look at the CQL definition:
http://www.datastax.com/docs/1.0/references/cql/index
(Look at the CQL Data Types section)

If that's not the client you are referencing, let us know which one you mean:
http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html

-brian

On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote:

 Thanks for the links, however I'm interested in the functionality that the 
 official Cassandra client/API (which is in Java) offers.
 
 
 2012/11/17 aaron morton aa...@thelastpickle.com
 Does the official/built-in Cassandra CQL client (in 1.2) 
 What language ? 
 
 Check the Java http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ 
 and python http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ 
 drivers.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/11/2012, at 11:21 AM, Timmy Turner timm.t...@gmail.com wrote:
 
 Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in 
 option to get direct values/objects when reading a field, instead of just a 
 byte array?
 
 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



Re: huge commitlog

2012-11-18 Thread aaron morton
 I am wondering whether the huge commitlog size is the expected behavior or 
 not?
Nope. 

Did you notice the large log size during or after the inserts ? 
If after did the size settle ?
Are you using commit log archiving ? (in commitlog_archiving.properties)

 and around 700 mini column family (around 10M in data_file_directories)
Can you describe how you created the 700 CF's ? 

 and how can we reduce the size of commitlog?
As a work around nodetool flush should checkpoint the log. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote:

 hi Cassandra Developers,
 
 I am experiencing huge commitlog size (200+G) after inserting huge
 amount of data.
 It is a 4-node cluster with RF= 3, and currently each has 200+G commit
 log (so there are around 1T commit log in total)
 
 The setting of commitlog_total_space_in_mb is default.
 
 I am using 1.1.6.
 
 I did not do nodetool cleanup and nodetool flush yet, but
 I did nodetool repair -pr for each column family.
 
 There is 1 huge column family (around 68G in data_file_directories),
 and 18 mid-huge column family (around 1G in data_file_directories)
 and around 700 mini column family (around 10M in data_file_directories)
 
 I am wondering whether the huge commitlog size is the expected behavior or 
 not?
 and how can we reduce the size of commitlog?
 
 Sincerely,
 Hsiao



Re: unable to read saved rowcache from disk

2012-11-18 Thread aaron morton
 . But what is the upper bound? And rules of thumb?
If you are using the off heap cache the upper bound is memory. If you are using 
the on head it's the JVM heap. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 2:35 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 Did that take into account the token, the row key, and the row payload, and 
 the java memory overhead ?
 how could I watch the heap usage then since jconsole is not able to connect 
 Cassandra at that time?
 
  Trying delete the saved cache and restarting.
 there is no problem for me to do so. But what is the upper bound? And rules 
 of thumb?
 
 
 On Sat, Nov 17, 2012 at 9:15 AM, aaron morton aa...@thelastpickle.com wrote:
 Just curious why do you think row key will take 300 byte? 
 That's what I thought it said earlier in the email thread. 
 
  If the row key is Long type, doesn't it take 8 bytes?
 Yes, 8 bytes on disk. 
  
 In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
 Did I miss something?
 
 
 Did that take into account the token, the row key, and the row payload, and 
 the java memory overhead ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/11/2012, at 9:35 AM, Wei Zhu wz1...@yahoo.com wrote:
 
 Just curious why do you think row key will take 300 byte? If the row key is 
 Long type, doesn't it take 8 bytes?
 In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
 Did I miss something?
 
 Thanks.
 -Wei
 
 From: aaron morton aa...@thelastpickle.com
 To: user@cassandra.apache.org 
 Sent: Thursday, November 15, 2012 12:15 PM
 Subject: Re: unable to read saved rowcache from disk
 
 For a row cache of 1,650,000:
 
 16 byte token
 300 byte row key ? 
 and row data ? 
 multiply by a java fudge factor or 5 or 10. 
 
 Trying delete the saved cache and restarting.
 
 Cheers
  
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 15/11/2012, at 8:20 PM, Wz1975 wz1...@yahoo.com wrote:
 
 Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average 
 300B, so 700k row should be a little over 200m, unless it is reading more,  
 maybe tombstone?  Or the rows on disk  have grown for some reason,  but row 
 cache was not updated?  Could be something else eats up the memory.  You 
 may profile memory and see who consumes the memory. 
 
 
 Thanks.
 -Wei
 
 Sent from my Samsung smartphone on ATT 
 
 
  Original message 
 Subject: Re: unable to read saved rowcache from disk 
 From: Manu Zhang owenzhang1...@gmail.com 
 To: user@cassandra.apache.org 
 CC: 
 
 
 3G, other jvm parameters are unchanged. 
 
 
 On Thu, Nov 15, 2012 at 2:40 PM, Wz1975 wz1...@yahoo.com wrote:
 How big is your heap?  Did you change the jvm parameter? 
 
 
 
 Thanks.
 -Wei
 
 Sent from my Samsung smartphone on ATT 
 
 
  Original message 
 Subject: Re: unable to read saved rowcache from disk 
 From: Manu Zhang owenzhang1...@gmail.com 
 To: user@cassandra.apache.org 
 CC: 
 
 
 add a counter and print out myself
 
 
 On Thu, Nov 15, 2012 at 1:51 PM, Wz1975 wz1...@yahoo.com wrote:
 Curious where did you see this? 
 
 
 Thanks.
 -Wei
 
 Sent from my Samsung smartphone on ATT 
 
 
  Original message 
 Subject: Re: unable to read saved rowcache from disk 
 From: Manu Zhang owenzhang1...@gmail.com 
 To: user@cassandra.apache.org 
 CC: 
 
 
 OOM at deserializing 747321th row
 
 
 On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang owenzhang1...@gmail.com wrote:
 oh, as for the number of rows, it's 165. How long would you expect it 
 to be read back?
 
 
 On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu wz1...@yahoo.com wrote:
 Good information Edward. 
 For my case, we have good size of RAM (76G) and the heap is 8G. So I set 
 the row cache to be 800M as recommended. Our column is kind of big, so the 
 hit ratio for row cache is around 20%, so according to datastax, might just 
 turn the row cache altogether. 
 Anyway, for restart, it took about 2 minutes to load the row cache
 
  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) 
 reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) 
 completed loading (102801 ms; 21125 keys) row cache for XXX.f2 
 
 Just for comparison, our key is long, the disk usage for row cache is 253K. 
 (it only stores key when row cache is saved to disk, so 253KB/ 8bytes = 
 31625 number of keys). It's about right...
 So for 15MB, there could be a lot of narrow rows. (if the key is Long, 
 could be more than 1M rows)
   
 Thanks.
 -Wei
 From: Edward Capriolo edlinuxg...@gmail.com
 To: user@cassandra.apache.org 
 Sent: Tuesday, November 13, 2012 11:13 PM
 Subject: Re: unable to read saved 

Re: Cassandra nodes failing with OOM

2012-11-18 Thread aaron morton
 1. How much GCInspector warnings per hour are considered 'normal'?
None. 
A couple during compaction or repair is not the end of the world. But if you 
have enough to thinking about per hour it's too many. 

 2. What should be the next thing to check?
Try to determine if the GC activity correlates to application workload, 
compaction or repair. 

Try to determine what the working set of the server is. Watch the GC activity 
(via gc logs or JMX) and see what the size of the tenured heap is after a CMS. 
Or try to calculate it 
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

Look at your data model and query patterns for places where very large queries 
are being made. Or rows that are very long lived with a lot of deletes (prob 
not as much as an issue with LDB). 
 

 3. What are the possible failure reasons and how to prevent those?

As above. 
As a work around sometimes drastically slowing down compaction can help. For 
LDB try reducing in_memory_compaction_limit_in_mb and 
compaction_throughput_mb_per_sec


Hope that helps. 

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 7:07 PM, Ивaн Cобoлeв sobol...@gmail.com wrote:

 Dear Community, 
 
 advice from you needed. 
 
 We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM 
 message). 
 Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch.
 
 Version: 1.1.6
 Hardware:  12Gb RAM / 8 cores(virtual)
 Data:  40Gb/node
 Nodes:   36 nodes
 
 Keyspaces:2(RF=3, R=W=2) + 1(OpsCenter)
 CFs:36, 2 indexes
 Partitioner:  Random
 Compaction:   Leveled(we don't want 2x space for housekeeping)
 Caching:  Keys only
 
 All is pretty much standard apart from the one CF receiving writes in 64K 
 chunks and having sstable_size_in_mb=100.
 No JNA installed - this is to be fixed soon.
 
 Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the 
 only change - network activity spiking. 
 All the nodes before dying had the following on logs:
  INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) 
  MemtablePostFlusher   1 4 0
  INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) 
  FlushWriter   1 3 0
  INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) 
  HintedHandoff 1 6 0
  INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) 
  CompactionManager 5 9
 
 GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 
 5-10mins.
 
 So, could you please give me a hint on:
 1. How much GCInspector warnings per hour are considered 'normal'?
 2. What should be the next thing to check?
 3. What are the possible failure reasons and how to prevent those?
 
 Thank you very much in advance,
 Ivan



Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread aaron morton
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
It does not disable compaction. 
it creates one big file, which will not be compacted until there are (by 
default) 3 other very big files. 


 1. Does a nodetool stop compaction also force the admin to manually run major 
 compaction ( I.e. disable automated major compactions ? ) 
No. 
Stop just stops the current compaction. 
Nothing is disabled. 

 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
Major compaction is not automatic. It is the manual nodetool compact command. 
Automatic (minor) compaction is controlled by min_compaction_threshold and 
max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  
This is not something you have to worry about. 
Unless you are seeing 1,000's of files using the default compaction. 

  For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables (
No problems. 

 4. Does a upgradesstables fix any compaction issues ? 
What are the compaction issues you are having ? 


Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote:

 
 We have a cluster  running cassandra 1.1.4. On this cluster, 
 
 1. We had to move the nodes around a bit  when we were adding new nodes 
 (there was quite a good amount of node movement ) 
 
 2. We had to stop compactions during some of the days to save some disk  
 space on some of the nodes when they were running very very low on disk 
 spaces. (via nodetool stop COMPACTION)  
 
 
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
 
 
 Here are the questions I have regarding compaction: 
 
 1. Does a nodetool stop compaction also force the admin to manually run major 
 compaction ( I.e. disable automated major compactions ? ) 
 
 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  . 
 For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables ( The node movement and repair and cleanup happened 
 between the older SSTables and the new SSTables being touched/modified)
 
 4. Does a upgradesstables fix any compaction issues ? 
 
 Regards,
 Ananth



Re: Upgrade 1.1.2 - 1.1.6

2012-11-18 Thread aaron morton
 time (UTC)   01   2   3   4   5   6   
 7   8   9   10  11  12  13
 Good value   88   44  26  35  26  86  187 251 
 455 389 473 367 453 373
 C* counter149 82  45  68  38  146 329 414 
 746 566 473 377 453 373
 I finished my Cassandra 1.1.6 upgrades at 9:30 UTC.


This looks like the counters were more out of sync before the upgrade than 
after?
Do you know if your client is retrying counter operations ? (I saw some dropped 
messages in the S1 log). 

S1 shows a lot of Commit Log replay going on. Reading your timeline below this 
sounds like the auto restart catching you out. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/11/2012, at 10:22 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Here is an example of the increase for some counter (counting events per hour)
 
 time (UTC)   01   2   3   4   5   6   
 7   8   9   10  11  12  13
 Good value   88   44  26  35  26  86  187 251 
 455 389 473 367 453 373
 C* counter149 82  45  68  38  146 329 414 
 746 566 473 377 453 373
 
 I finished my Cassandra 1.1.6 upgrades at 9:30 UTC.
 
 I found wrong values since the day before at 20:00 UTC (counters from hours 
 before are good)
 
 Here are the logs from the output:
 Server 1: http://pastebin.com/WyCm6Ef5 (This one is from the same server as 
 the first bash history on my first mail)
 Server 2: http://pastebin.com/gBe2KL2b  (This one is from the same server as 
 the second bash history on my first mail)
 
 Alain
 
 2012/11/15 aaron morton aa...@thelastpickle.com
 Can you provide an example of the increase ? 
 
 Can you provide the log from startup ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/11/2012, at 3:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 We had an issue with counters over-counting even using the nodetool drain 
 command before upgrading...
 
 Here is my bash history
 
69  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
70  cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
71  sudo apt-get install cassandra
72  nodetool disablethrift
73  nodetool drain
74  service cassandra stop
75  cat /etc/cassandra/cassandra-env.sh 
 /etc/cassandra/cassandra-env.sh.bak
76  vim /etc/cassandra/cassandra-env.sh
77  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
78  vim /etc/cassandra/cassandra.yaml
79  service cassandra start
 
 So I think I followed these steps 
 http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps
 
 I merged my conf files with an external tool so consider I merged my conf 
 files on steps 76 and 78.
 
 I saw that the sudo apt-get install cassandra stop the server and restart 
 it automatically. So it updated without draining and restart before I had 
 the time to reconfigure the conf files. Is this normal ? Is there a way to 
 avoid it ?
 
 So for the second node I decided to try to stop C*before the upgrade.
 
   125  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
   126  cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
   127  nodetool disablegossip
   128  nodetool disablethrift
   129  nodetool drain
   130  service cassandra stop
   131  sudo apt-get install cassandra
 
 //131 : This restarted cassandra
 
   132  nodetool disablethrift
   133  nodetool disablegossip
   134  nodetool drain
   135  service cassandra stop
   136  cat /etc/cassandra/cassandra-env.sh 
 /etc/cassandra/cassandra-env.sh.bak
   137  cim /etc/cassandra/cassandra-env.sh
   138  vim /etc/cassandra/cassandra-env.sh
   139  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
   140  vim /etc/cassandra/cassandra.yaml
   141  service cassandra start
 
 After both of these updates I saw my current counters increase without any 
 reason.
 
 Did I do anything wrong ?
 
 Alain
 
 
 



Re: unable to read saved rowcache from disk

2012-11-18 Thread Manu Zhang

 If you are using the off heap cache the upper bound is memory. If you are
 using the on head it's the JVM heap.


But as I said earlier, I could not watch the usage of JVM heap while
reading saved caches


Re: huge commitlog

2012-11-18 Thread Chuan-Heng Hsiao
Hi Aaron,

Thank you very much for the replying.

The 700 CFs were created in the beginning (before any insertion.)

I did not do anything with commitlog_archiving.properties, so I guess
I was not using commit log archiving.

What I did was doing a lot of insertions (and some deletions)
using another 4 machines with 32 processes in total.
(There are 4 nodes in my setting, so there are 8 machines in total)

I did see huge logs in /var/log/cassandra after such huge amount of insertions.
Right now I  can't distinguish whether single insertion also cause huge logs.

nodetool flush hanged (maybe because of 200G+ commitlog)

Because these machines are not in production (guaranteed no more
insertion/deletion)
I ended up restarting cassandra one node each time, the commitlog
shrinked back to
4G. I am doing repair on each node now.

I'll try to re-import and keep logs when the commitlog increases insanely again.

Sincerely,
Hsiao


On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com wrote:
 I am wondering whether the huge commitlog size is the expected behavior or
 not?

 Nope.

 Did you notice the large log size during or after the inserts ?
 If after did the size settle ?
 Are you using commit log archiving ? (in commitlog_archiving.properties)

 and around 700 mini column family (around 10M in data_file_directories)

 Can you describe how you created the 700 CF's ?

 and how can we reduce the size of commitlog?

 As a work around nodetool flush should checkpoint the log.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com
 wrote:

 hi Cassandra Developers,

 I am experiencing huge commitlog size (200+G) after inserting huge
 amount of data.
 It is a 4-node cluster with RF= 3, and currently each has 200+G commit
 log (so there are around 1T commit log in total)

 The setting of commitlog_total_space_in_mb is default.

 I am using 1.1.6.

 I did not do nodetool cleanup and nodetool flush yet, but
 I did nodetool repair -pr for each column family.

 There is 1 huge column family (around 68G in data_file_directories),
 and 18 mid-huge column family (around 1G in data_file_directories)
 and around 700 mini column family (around 10M in data_file_directories)

 I am wondering whether the huge commitlog size is the expected behavior or
 not?
 and how can we reduce the size of commitlog?

 Sincerely,
 Hsiao




Re: huge commitlog

2012-11-18 Thread Tupshin Harper
What consistency level are you writing with? If you were writing with ANY,
try writing with a higher consistency level.

-Tupshin
On Nov 18, 2012 9:05 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com
wrote:

 Hi Aaron,

 Thank you very much for the replying.

 The 700 CFs were created in the beginning (before any insertion.)

 I did not do anything with commitlog_archiving.properties, so I guess
 I was not using commit log archiving.

 What I did was doing a lot of insertions (and some deletions)
 using another 4 machines with 32 processes in total.
 (There are 4 nodes in my setting, so there are 8 machines in total)

 I did see huge logs in /var/log/cassandra after such huge amount of
 insertions.
 Right now I  can't distinguish whether single insertion also cause huge
 logs.

 nodetool flush hanged (maybe because of 200G+ commitlog)

 Because these machines are not in production (guaranteed no more
 insertion/deletion)
 I ended up restarting cassandra one node each time, the commitlog
 shrinked back to
 4G. I am doing repair on each node now.

 I'll try to re-import and keep logs when the commitlog increases insanely
 again.

 Sincerely,
 Hsiao


 On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com
 wrote:
  I am wondering whether the huge commitlog size is the expected behavior
 or
  not?
 
  Nope.
 
  Did you notice the large log size during or after the inserts ?
  If after did the size settle ?
  Are you using commit log archiving ? (in commitlog_archiving.properties)
 
  and around 700 mini column family (around 10M in data_file_directories)
 
  Can you describe how you created the 700 CF's ?
 
  and how can we reduce the size of commitlog?
 
  As a work around nodetool flush should checkpoint the log.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com
  wrote:
 
  hi Cassandra Developers,
 
  I am experiencing huge commitlog size (200+G) after inserting huge
  amount of data.
  It is a 4-node cluster with RF= 3, and currently each has 200+G commit
  log (so there are around 1T commit log in total)
 
  The setting of commitlog_total_space_in_mb is default.
 
  I am using 1.1.6.
 
  I did not do nodetool cleanup and nodetool flush yet, but
  I did nodetool repair -pr for each column family.
 
  There is 1 huge column family (around 68G in data_file_directories),
  and 18 mid-huge column family (around 1G in data_file_directories)
  and around 700 mini column family (around 10M in data_file_directories)
 
  I am wondering whether the huge commitlog size is the expected behavior
 or
  not?
  and how can we reduce the size of commitlog?
 
  Sincerely,
  Hsiao
 
 



Re: huge commitlog

2012-11-18 Thread Chuan-Heng Hsiao
I have RF = 3. Read/Write consistency has already been set as TWO.

It did seem that the data were not consistent yet.
(There are some CFs that I expected empty after the operations, but still
 got some data, and the number of data were decreasing after retrying
to get all data
 from that CF)

Sincerely,
Hsiao


On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper tups...@tupshin.com wrote:
 What consistency level are you writing with? If you were writing with ANY,
 try writing with a higher consistency level.

 -Tupshin

 On Nov 18, 2012 9:05 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com
 wrote:

 Hi Aaron,

 Thank you very much for the replying.

 The 700 CFs were created in the beginning (before any insertion.)

 I did not do anything with commitlog_archiving.properties, so I guess
 I was not using commit log archiving.

 What I did was doing a lot of insertions (and some deletions)
 using another 4 machines with 32 processes in total.
 (There are 4 nodes in my setting, so there are 8 machines in total)

 I did see huge logs in /var/log/cassandra after such huge amount of
 insertions.
 Right now I  can't distinguish whether single insertion also cause huge
 logs.

 nodetool flush hanged (maybe because of 200G+ commitlog)

 Because these machines are not in production (guaranteed no more
 insertion/deletion)
 I ended up restarting cassandra one node each time, the commitlog
 shrinked back to
 4G. I am doing repair on each node now.

 I'll try to re-import and keep logs when the commitlog increases insanely
 again.

 Sincerely,
 Hsiao


 On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com
 wrote:
  I am wondering whether the huge commitlog size is the expected behavior
  or
  not?
 
  Nope.
 
  Did you notice the large log size during or after the inserts ?
  If after did the size settle ?
  Are you using commit log archiving ? (in commitlog_archiving.properties)
 
  and around 700 mini column family (around 10M in data_file_directories)
 
  Can you describe how you created the 700 CF's ?
 
  and how can we reduce the size of commitlog?
 
  As a work around nodetool flush should checkpoint the log.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com
  wrote:
 
  hi Cassandra Developers,
 
  I am experiencing huge commitlog size (200+G) after inserting huge
  amount of data.
  It is a 4-node cluster with RF= 3, and currently each has 200+G commit
  log (so there are around 1T commit log in total)
 
  The setting of commitlog_total_space_in_mb is default.
 
  I am using 1.1.6.
 
  I did not do nodetool cleanup and nodetool flush yet, but
  I did nodetool repair -pr for each column family.
 
  There is 1 huge column family (around 68G in data_file_directories),
  and 18 mid-huge column family (around 1G in data_file_directories)
  and around 700 mini column family (around 10M in data_file_directories)
 
  I am wondering whether the huge commitlog size is the expected behavior
  or
  not?
  and how can we reduce the size of commitlog?
 
  Sincerely,
  Hsiao
 
 


Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread Ananth Gundabattula
Hello Aaron,

Thanks a lot for the reply.

Looks like the documentation is confusing. Here is the link I am referring
to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction


 It does not disable compaction.
As per the above url,  After running a major compaction, automatic minor
compactions are no longer triggered, frequently requiring you to manually
run major compactions on a routine basis. ( Just before the heading Tuning
Column Family compression in the above link)

With respect to the replies below :


 it creates one big file, which will not be compacted until there are (by
default) 3 other very big files.
This is for the minor compaction and major compaction
should theoretically result in one large file irrespective of the number of
data files initially?

This is not something you have to worry about. Unless you are seeing
1,000's of files using the default compaction.

Well my worry has been because of the large amount of node movements we
have done in the ring. We started off with 6 nodes and increased the
capacity to 12 with disproportionate increases every time which resulted in
a lot of clean of data folders except system, run repair and then a cleanup
with an aborted attempt in between.

There were some data.db files older by more than 2 weeks and were not
modified since then. My understanding of the compaction process was that
since data files keep continuously merging we should not have data files
with very old last modified timestamps (assuming there is a good amount of
writes to the table continuously) I did not have a for sure way of telling
if everything is alright with the compaction looking at the last modified
timestamps of all the data.db files.

What are the compaction issues you are having ?
Your replies confirm that the timestamps should not be an issue to worry
about. So I guess I should not be calling them as issues any more.  But
performing an upgradesstables did decrease the number of data files and
removed all the data files with the old timestamps.



Regards,
Ananth


On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.comwrote:

 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast
 for major compactions but not minor compactions )

 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.


 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 No.
 Stop just stops the current compaction.
 Nothing is disabled.

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )

 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.

  For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (

 No problems.

 4. Does a upgradesstables fix any compaction issues ?

 What are the compaction issues you are having ?


 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:


 We have a cluster  running cassandra 1.1.4. On this cluster,

 1. We had to move the nodes around a bit  when we were adding new nodes
 (there was quite a good amount of node movement )

 2. We had to stop compactions during some of the days to save some disk
  space on some of the nodes when they were running very very low on disk
 spaces. (via nodetool stop COMPACTION)


 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast
 for major compactions but not minor compactions )


 Here are the questions I have regarding compaction:

 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )  . For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the 

Re: get_range_slice gets no rowcache support?

2012-11-18 Thread Manu Zhang
yes, https://issues.apache.org/jira/browse/CASSANDRA-1302
thanks


On Wed, Nov 14, 2012 at 2:04 AM, Tyler Hobbs ty...@datastax.com wrote:

 As far as I know, the row cache has never been populated by
 get_range_slices(), only normal gets/multigets.  The behavior is this way
 because get_range_slices() is almost exclusively used to page over an
 entire column family, which generally would not fit into the cache and
 would simply a) ruin your cache if used for gets (b) generate a lot of
 extra garbage, and (c) result in nothing but cache misses.

 With that said, I'm sure there are still a few use cases where using the
 cache would be beneficial, so I'm sure there's a ticket out there somewhere
 that presents a few options for supporting this.


 On Thu, Nov 8, 2012 at 8:39 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 I did overlook something. get_range_slice will invoke cfs.getRawCachedRow
 instead of cfs.getThroughCache. Hence, no row will be cached if it's not
 present in the row cache. Well, this puzzles me further as to that how the
 range of rows is expected to get stored into the row cache in the first
 place.

 Would someone please clarify it for me? Thanks in advance.


 On Thu, Nov 8, 2012 at 3:23 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 I've asked this question before. And after reading the source codes, I
 find that get_range_slice doesn't query rowcache before reading from
 Memtable and SSTable. I just want to make sure whether I've overlooked
 something. If my observation is correct, what's the consideration here?





 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: Datatype Conversion in CQL-Client?

2012-11-18 Thread Tommi Laukkanen
I think Timmy might be referring to the upcoming native CQL Java driver
that might be coming with 1.2 - It was mentioned here:
http://www.datastax.com/wp-content/uploads/2012/08/7_Datastax_Upcoming_Changes_in_Drivers.pdf

I would also be interested on testing that but I can't find it from
repositories. Any hints?

Regards,
Tommi L.

*From:* Brian O'Neill [mailto:boneil...@gmail.com] *On Behalf Of *Brian
O'Neill

 *Sent:* 18. marraskuuta 2012 17:47
 *To:* user@cassandra.apache.org
 *Subject:* Re: Datatype Conversion in CQL-Client?
 *Importance:* Low

 ** **

 ** **

 If you are talking about the CQL-client that comes with Cassandra (cqlsh),
 it is actually written in Python:

 https://github.com/apache/cassandra/blob/trunk/bin/cqlsh

 ** **

 For information on datatypes (and conversion) take a look at the CQL
 definition:

 http://www.datastax.com/docs/1.0/references/cql/index

 (Look at the CQL Data Types section)

 ** **

 If that's not the client you are referencing, let us know which one you
 mean:

 http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html**
 **

 ** **

 -brian

 ** **

 On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote:



 

 Thanks for the links, however I'm interested in the functionality that the
 official Cassandra client/API (which is in Java) offers.

 ** **

 2012/11/17 aaron morton aa...@thelastpickle.com

 Does the official/built-in Cassandra CQL client (in 1.2) 

 What language ? 

 ** **

 Check the Java
 http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ and python
 http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ drivers.***
 *

 ** **

 Cheers

 ** **

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 16/11/2012, at 11:21 AM, Timmy Turner timm.t...@gmail.com wrote:



 

 Does the official/built-in Cassandra CQL client (in 1.2) offer any
 built-in option to get direct values/objects when reading a field, instead
 of just a byte array? 

 ** **

 ** **

 ** **

 --
 Brian ONeill
 Lead Architect, Health Market Science (http://healthmarketscience.com)
 mobile:215.588.6024
 blog: http://weblogs.java.net/blog/boneill42/
 blog: http://brianoneill.blogspot.com/ 

 ** **