Re: Sorting & pagination in apache cassandra 2.1

2016-01-13 Thread Narendra Sharma
In the example you gave the primary key user _ name is the row key. Since
the default partition is random you are getting rows in random order.

Since each row no clustering column there is no further grouping of data.
Or in simple terms each row has one record and is being returned ordered by
column name.

To see some meaningful ordering there should be some clustering column
defined.

You can use create additional column families to maintain ordering. Or use
external solutions like elasticsearch.
On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>


Re: Cassandra table as Queue

2015-08-20 Thread Narendra Sharma
I would suggest looking at Comcast Message Bus schema definition.

https://github.com/Comcast/cmb

-Naren

On Thu, Aug 20, 2015 at 10:30 AM, algermissen1971 
algermissen1...@icloud.com wrote:

 Hi Rado,

 On 20 Aug 2015, at 15:05, Radoslav Smilyanov 
 radoslav.smilya...@novarto.com wrote:

  Hello,
 
  I need to have a table that is acting as a queue for specific data. Data
 that has to be stored in this table are some unique ids that have to be
 predefined and whenever it is requested one id has to be obtained by the
 queue and new one has to be added. This queue table will have fixed size
 of 50 000 entries.
 
  I see that it is not recommended at all to use cassandra table for a
 queue, but I need to find a design for my data that will not cause
 performance issues caused by tombstones.
 
  I am using cassandra 2.1.6 with java driver and I am afraid that at some
 point of time I will start experiencing performance issues caused by many
 tombstones.
  Current design of my table with one column is not good enough for
 querying the data since now I am using:
  1. select * from table limit 1 which returns me first id in the table
  2. delete from table where id = id_from step 1
 
  Did someone try to implement a queue with cassandra table that is
 working productively now without any performance issues? I will appreciate
 some hints how can I achieve good performance in cassandra for a queue
 table.
 

 I came up with a design last year that I am using without problems with a
 java-driver -based implementation in production since several months.

 Two caveats:

 - Our environment is not high-volume or high-frequency. Message counts per
 minute come in dozens, at most. So the design is not tested in heavy
 scenarios. We merely needed something based on the existing tech-stack.
 - The Ruby version has a logical bug, mentioned in the README.

 https://github.com/algermissen/cassandra-ruby-sharded-workers

 Given the tombstone problem I what I know by now, I'd rather not use a TTL
 on the messages but remove outdated time shards completely after e.g. a
 week. But since reads never really go to an outdated shard, the tombstones
 do not slow down the reads.

 Hope that helps.

 Jan






  Thanks,
  Rado




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra leap second

2015-07-01 Thread Narendra Sharma
We also experienced same, i.e. high cpu on Cassandra 1.1.4 node running in
AWS. Restarting the vm worked.

On Wed, Jul 1, 2015 at 4:58 AM, Jason Wee peich...@gmail.com wrote:

 same here too, on branch 1.1 and have not seen any high cpu usage.

 On Wed, Jul 1, 2015 at 2:52 PM, John Wong gokoproj...@gmail.com wrote:

 Which version are you running and what's your kernel version? We are
 still running on 1.2 branch but we have not seen any high cpu usage yet...

 On Tue, Jun 30, 2015 at 11:10 PM, snair123 . nair...@outlook.com wrote:

 reboot of the machine worked

 --
 From: nair...@outlook.com
 To: user@cassandra.apache.org
 Subject: Cassandra leap second
 Date: Wed, 1 Jul 2015 02:54:53 +

 Is it ok to run this


 https://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

 Seeing high cpu consumption for cassandra process





 --
 Sent from Jeff Dean's printf() mobile console





-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Data model suggestions

2015-04-23 Thread Narendra Sharma
I think one table say record should be good. The primary key is record id.
This will ensure good distribution.
Just update the active attribute to true or false.
For range query on active vs archive records maintain 2 indexes or try
secondary index.
On Apr 23, 2015 1:32 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 Good point about the range selects. I think they can be made to work with
 limits, though. Or, since the active records will never usually be  500k,
 the ids may just be cached in memory.

 Most of the time, during reads, the queries will just consist of select *
 where primaryKey = someValue . One row at a time.

 The question is just, whether to keep all records in one table (including
 archived records which wont be queried 99% of the time), or to keep active
 records in their own table, and delete them when they're no longer active.
 Will that produce tombstone issues?

 On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar khangaon...@gmail.com
  wrote:

 Hi,

 If your external API returns active records, that means I am guessing you
 need to do a select * on the active table to figure out which records in
 the table are no longer active.

 You might be aware that range selects based on partition key will timeout
 in cassandra. They can however be made to work using the column cluster
 key.

 To comment more, We would need to see your proposed cassandra tables and
 queries that you might need to run.

 regards




 On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar ali.rac...@gmail.com wrote:

 That's returned by the external API we're querying. We query them for
 active records, if a previous active record isn't included in the results,
 that means its time to archive that record.

 On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar 
 khangaon...@gmail.com wrote:

 Hi,

 How do you determine if the record is no longer active ? Is it a
 perioidic process that goes through every record and checks when the last
 update happened ?

 regards

 On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar ali.rac...@gmail.com
 wrote:

 Hey all,

 We are working on moving a mysql based application to Cassandra.

 The workflow in mysql is this: We have two tables: active and archive
 . Every hour, we pull in data from an external API. The records which are
 active, are kept in 'active' table. Once a record is no longer active, its
 deleted from 'active' and re-inserted into 'archive'

 The purpose for that, is because most of the time, queries are only
 done against the active records rather than archived. Therefore keeping 
 the
 active table small may help with faster queries, if it only has to search
 200k records vs 3 million or more.

 Is it advisable to keep the same data model in Cassandra? I'm
 concerned about tombstone issues when records are deleted from active.

 Thanks.




 --
 http://khangaonkar.blogspot.com/





 --
 http://khangaonkar.blogspot.com/





Re: question about secondary index or not

2014-01-30 Thread Narendra Sharma
I am sure there will be other attributes associated with employee. Reading
and throwing away records on the client is not good.

Better maintain another column family that holds reference to only male
employees. This will make your pagination logic simple on the client side
without wasting resources on server or client side.

My experience with secondary indexes was also not good. My own index CF
gave 100% better performance than secondary index for the same usecase and
result.




On Thu, Jan 30, 2014 at 6:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 There is a aubtle difference between work well amd efficient design.

 Say you add this index, that is a huge cost on disk just because cql may
 not allow the where clause you want.

 Shameless plug but this is why i worked on intravert...server side paging
 may be the right answer here. I plan on opening that work all up again and
 finding a way to get it merged into cassandra.


 On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com
 wrote:
  Thanks for that info ondrej, I've never tested out secondary indexes as
 I've avoided them because of all the uncertainty around them, and your
 statement just adds to the uncertainty.  Everything I had read said that
 secondary indexes were supposed to work well for columns with low
 cardinality, but I guess that's not always the case.
  peace,
  Rob
 
  On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com
 wrote:
 
  Hi,
  we had a similar use case. Just do the filtering client-side, the #2
 example performs horribly, secondary indexes on something dividing the set
 into two roughly the same size subsets just don't work.
  Give it a try on localhost with just a couple of records (150.000), you
 will see.
  regards,
  ondrej
 
  On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com
 wrote:
 
  in my #2 example:
  select * from people where company_id='xxx' and gender='male'
  I already specify the first part of the primary key(row key) in my
 where clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
  But then if I don't create index first, the cql statement will run
 into syntax error.
 
 
 
  On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:
 
  I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77
 
  On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo 
 edlinuxg...@gmail.com wrote:
 
  Generally indexes on binary fields true/false male/female are not
 terrible effective.
 
 
  On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com
 wrote:
 
  I have a simple column family like the following
  create table people(
  company_id text,
  employee_id text,
  gender text,
  primary key(company_id, employee_id)
  );
  if I want to find out all the male employee given a company id, I
 can do
  1/
  select * from people where company_id='
  and loop through the result efficiently to pick the employee who
 has gender column value equal to male
  2/
  add a seconday index
  create index gender_index on people(gender)
  select * from people where company_id='xxx' and gender='male'
 
  I though #2 seems more appropriate, but I also thought the
 secondary index is helping only locating the primary row key, with the
 select clause in #2, is it more efficient than #1 where application
 responsible loop through the result and filter the right content?
  (
  It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
  select * from people where gender='male
  )
  thanks
 
 
 
 
 

 --
 Sorry this was sent from mobile. Will do less grammar and spell check than
 usual.




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Narendra Sharma
Any pointers? I am planning to do rolling restart of the cluster nodes to
see if it will help.
On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com
wrote:

 RF=3.
 On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 what is the RF? What does nodetool ring show?


 On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Sorry for the odd subject but something is wrong with our cassandra
 ring. We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7. So
 the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
 steam from (worst case) N5, N6, N7, N8. What could potentially cause the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*





Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Narendra Sharma
Here is the nodetool ring output.

Address DC  RackStatus State   Load
 Effective-Ownership Token

   148873535527910577765226390751398592512
10.3.1.179  datacenter1 rack1   Up Normal  752.53 GB
37.50%  0
10.3.1.29   datacenter1 rack1   Up Normal  704.36 GB
37.50%  21267647932558653966460912964485513215
10.3.1.206  datacenter1 rack1   Up Normal  561.68 GB
31.25%  31901471898837980949691369446728269825
10.3.1.175  datacenter1 rack1   Up Normal  1.33 TB
25.00%  42535295865117307932921825928971026431
10.3.1.239  datacenter1 rack1   Up Normal  784.91 GB
18.75%  53169119831396634916152282411213783039
10.3.1.24   datacenter1 rack1   Up Normal  1.06 TB
18.75%  63802943797675961899382738893456539648
*I tried add a new node with token 7443676776395522613195375699296255*
10.3.1.177  datacenter1 rack1   Up Normal  1.01 TB
25.00%  85070591730234615865843651857942052863
10.3.1.135  datacenter1 rack1   Up Normal  702.56 GB
31.25%  106338239662793269832304564822427566080
10.3.1.178  datacenter1 rack1   Up Normal  783.75 GB
37.50%  127605887595351923798765477786913079295
10.3.1.30   datacenter1 rack1   Up Normal  630.09 GB
37.50%  148873535527910577765226390751398592512


After looking at the nodes it was streaming from, I stopped the node.


On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Please include the output of nodetool ring, otherwise no one can help
 you.


 On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Any pointers? I am planning to do rolling restart of the cluster nodes to
 see if it will help.
 On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 RF=3.
 On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 what is the RF? What does nodetool ring show?


 On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Sorry for the odd subject but something is wrong with our cassandra
 ring. We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7.
 So the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it
 to steam from (worst case) N5, N6, N7, N8. What could potentially cause 
 the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Cassandra ring not behaving like a ring

2014-01-15 Thread Narendra Sharma
Sorry for the odd subject but something is wrong with our cassandra ring.
We have a 9 node ring as below.

N1 - UP/NORMAL
N2 - UP/NORMAL
N3 - UP/NORMAL
N4 - UP/NORMAL
N5 - UP/NORMAL
N6 - UP/NORMAL
N7 - UP/NORMAL
N8 - UP/NORMAL
N9 - UP/NORMAL

Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

I added a new node with token that is exactly in middle of N6 and N7. So
the ring displayed as following
N1 - UP/NORMAL
N2 - UP/NORMAL
N3 - UP/NORMAL
N4 - UP/NORMAL
N5 - UP/NORMAL
N6 - UP/NORMAL
N6.5 - UP/JOINING
N7 - UP/NORMAL
N8 - UP/NORMAL
N9 - UP/NORMAL


I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
steam from (worst case) N5, N6, N7, N8. What could potentially cause the
node to get confused about the ring?

-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra ring not behaving like a ring

2014-01-15 Thread Narendra Sharma
RF=3.
On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 what is the RF? What does nodetool ring show?


 On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Sorry for the odd subject but something is wrong with our cassandra ring.
 We have a 9 node ring as below.

 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL

 Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

 I added a new node with token that is exactly in middle of N6 and N7. So
 the ring displayed as following
 N1 - UP/NORMAL
 N2 - UP/NORMAL
 N3 - UP/NORMAL
 N4 - UP/NORMAL
 N5 - UP/NORMAL
 N6 - UP/NORMAL
 N6.5 - UP/JOINING
 N7 - UP/NORMAL
 N8 - UP/NORMAL
 N9 - UP/NORMAL


 I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
 steam from (worst case) N5, N6, N7, N8. What could potentially cause the
 node to get confused about the ring?

 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*





Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
8 node cluster running in aws. Any pointers where I should start looking?
No kill -9 in history.


Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
The root cause turned out to be high heap. The Linux OOM Killer (
http://linux-mm.org/OOM_Killer) killed the process. It took some time to
figure out but very interesting. We knew high heap is a problem but had no
clue when the actual heap usage was well within limit and the process
disappeared. syslog helped figure this out.

About Linux OOM Killer
It is the job of the linux 'oom killer' to *sacrifice* one or more
processes in order to free up memory for the system when all else fails


On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma narendra.sha...@gmail.com
  wrote:

 8 node cluster running in aws. Any pointers where I should start looking?
 No kill -9 in history.

 You should start looking at instructions as to how to upgrade to at least
 the top of the 1.1 line... :D

 =Rob




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
In this case the Java/Cassandra process never ran out of memory. Rather it
had 20% heap free. It is the OS that ran out of memory. This is the side
effect of running with large heap. I was aware of the Java's inefficiency
wrt large heap but had to keep it due to large bloomfilter. Note we are
still on 1.1.x.




On Thu, Jan 2, 2014 at 10:03 PM, Nitin Sharma
nitin.sha...@bloomreach.comwrote:

 I would recommend always running cassandra with
  -XX:+HeapDumpOnOutofMemoryError. This dumps out  a *.hprof file if the
 process dies due to OOM

 You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse
 MAT http://www.eclipse.org/mat) to figure out root causes and potential
 leaks

 Hope this helps
 -- Nitin


 On Thu, Jan 2, 2014 at 9:00 PM, Narendra Sharma narendra.sha...@gmail.com
  wrote:

 The root cause turned out to be high heap. The Linux OOM Killer (
 http://linux-mm.org/OOM_Killer) killed the process. It took some time to
 figure out but very interesting. We knew high heap is a problem but had no
 clue when the actual heap usage was well within limit and the process
 disappeared. syslog helped figure this out.

 About Linux OOM Killer
 It is the job of the linux 'oom killer' to *sacrifice* one or more
 processes in order to free up memory for the system when all else fails


 On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli rc...@eventbrite.comwrote:

 On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 8 node cluster running in aws. Any pointers where I should start
 looking?
 No kill -9 in history.

 You should start looking at instructions as to how to upgrade to at
 least the top of the 1.1 line... :D

 =Rob




 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com*
 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*




 --
 -- Nitin




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match

2013-12-18 Thread Narendra Sharma
Thanks Aaron. No tmp files and not even a single exception in the
system.log.

If the file was last modified on 20-Nov then there must be an entry for
that in the log (either completed streaming or compacted).


On Tue, Dec 17, 2013 at 7:23 PM, Aaron Morton aa...@thelastpickle.comwrote:

 -tmp- files will sit in the data dir, if there was an error creating them
 during compaction or flushing to disk they will sit around until a restart.

 Check the logs for errors to see if compaction was failing on something.

 Cheers

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 17/12/2013, at 12:28 pm, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 No snapshots.

 I restarted the node and now the Load in ring is in sync with the disk
 usage. Not sure what caused it to go out of sync. However, the Live SStable
 count doesn't match exactly with the number of data files on disk.

 I am going through the Cassandra code to understand what could be the
 reason for the mismatch in the sstable count and also why there is no
 reference of some of the data files in system.log.




 On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua aba...@247-inc.com wrote:



 Do you have any snapshots on the nodes where you are seeing this issue?

 Snapshots will link to sstables which will cause them not be deleted.



 -Arindam



 *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com]
 *Sent:* Sunday, December 15, 2013 1:15 PM
 *To:* user@cassandra.apache.org
 *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring
 doesn't match



 We have 8 node cluster. Replication factor is 3.



 For some of the nodes the Disk usage (du -ksh .) in the data directory
 for CF doesn't match the Load reported in nodetool ring command. When we
 expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was
 okay. Over period of last 2-3 weeks the disk usage has gone up. We
 increased the RF from 2 to 3 2 weeks ago.



 I am not sure if increasing the RF is causing this issue.



 For one of the nodes that I analyzed:

 1. nodetool ring reported load as 575.38 GB



 2. nodetool cfstats for the CF reported:

 SSTable count: 28

 Space used (live): 572671381955

 Space used (total): 572671381955





 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned

 46



 4. 'du -ksh .' in the data folder for CF returned

 720G



 The above numbers indicate that there are some sstables that are obsolete
 and are still occupying space on disk. What could be wrong? Will restarting
 the node help? The cassandra process is running for last 45 days with no
 downtime. However, because the disk usage is high, we are not able to run
 full compaction.



 Also, I can't find reference to each of the sstables on disk in the
 system.log file. For eg I have one data file on disk as (ls -lth):

 86G Nov 20 06:14



 I have system.log file with first line:

 INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line
 101) Logging initialized



 The 86G file must be a result of some compaction. I see no reference of
 data file in system.log file between 11/18 to 11/25. What could be the
 reason for that? The only reference is dated 11/29 when the file was being
 streamed to another node (new node).



 How can I identify the obsolete files and remove them? I am thinking
 about following. Let me know if it make sense.

 1. Restart the node and check the state.

 2. Move the oldest data files to another location (to another mount point)

 3. Restart the node again

 4. Run repair on the node so that it can get the missing data from its
 peers.





 I compared the numbers of a healthy node for the same CF:

 1. nodetool ring reported load as 662.95 GB



 2. nodetool cfstats for the CF reported:

 SSTable count: 16

 Space used (live): 670524321067

 Space used (total): 670524321067



 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned

 16



 4. 'du -ksh .' in the data folder for CF returned

 625G





 -Naren






 --
 Narendra Sharma

 Software Engineer

 *http://www.aeris.com http://www.aeris.com/*

 *http://narendrasharma.blogspot.com/
 http://narendrasharma.blogspot.com/*






 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.aeris.com/*
 *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*





-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match

2013-12-16 Thread Narendra Sharma
No snapshots.

I restarted the node and now the Load in ring is in sync with the disk
usage. Not sure what caused it to go out of sync. However, the Live SStable
count doesn't match exactly with the number of data files on disk.

I am going through the Cassandra code to understand what could be the
reason for the mismatch in the sstable count and also why there is no
reference of some of the data files in system.log.




On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua aba...@247-inc.com wrote:



 Do you have any snapshots on the nodes where you are seeing this issue?

 Snapshots will link to sstables which will cause them not be deleted.



 -Arindam



 *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com]
 *Sent:* Sunday, December 15, 2013 1:15 PM
 *To:* user@cassandra.apache.org
 *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring
 doesn't match



 We have 8 node cluster. Replication factor is 3.



 For some of the nodes the Disk usage (du -ksh .) in the data directory for
 CF doesn't match the Load reported in nodetool ring command. When we
 expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was
 okay. Over period of last 2-3 weeks the disk usage has gone up. We
 increased the RF from 2 to 3 2 weeks ago.



 I am not sure if increasing the RF is causing this issue.



 For one of the nodes that I analyzed:

 1. nodetool ring reported load as 575.38 GB



 2. nodetool cfstats for the CF reported:

 SSTable count: 28

 Space used (live): 572671381955

 Space used (total): 572671381955





 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned

 46



 4. 'du -ksh .' in the data folder for CF returned

 720G



 The above numbers indicate that there are some sstables that are obsolete
 and are still occupying space on disk. What could be wrong? Will restarting
 the node help? The cassandra process is running for last 45 days with no
 downtime. However, because the disk usage is high, we are not able to run
 full compaction.



 Also, I can't find reference to each of the sstables on disk in the
 system.log file. For eg I have one data file on disk as (ls -lth):

 86G Nov 20 06:14



 I have system.log file with first line:

 INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line
 101) Logging initialized



 The 86G file must be a result of some compaction. I see no reference of
 data file in system.log file between 11/18 to 11/25. What could be the
 reason for that? The only reference is dated 11/29 when the file was being
 streamed to another node (new node).



 How can I identify the obsolete files and remove them? I am thinking about
 following. Let me know if it make sense.

 1. Restart the node and check the state.

 2. Move the oldest data files to another location (to another mount point)

 3. Restart the node again

 4. Run repair on the node so that it can get the missing data from its
 peers.





 I compared the numbers of a healthy node for the same CF:

 1. nodetool ring reported load as 662.95 GB



 2. nodetool cfstats for the CF reported:

 SSTable count: 16

 Space used (live): 670524321067

 Space used (total): 670524321067



 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned

 16



 4. 'du -ksh .' in the data folder for CF returned

 625G





 -Naren






 --
 Narendra Sharma

 Software Engineer

 *http://www.aeris.com http://www.aeris.com*

 *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*






-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match

2013-12-15 Thread Narendra Sharma
We have 8 node cluster. Replication factor is 3.

For some of the nodes the Disk usage (du -ksh .) in the data directory for
CF doesn't match the Load reported in nodetool ring command. When we
expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was
okay. Over period of last 2-3 weeks the disk usage has gone up. We
increased the RF from 2 to 3 2 weeks ago.

I am not sure if increasing the RF is causing this issue.

For one of the nodes that I analyzed:
1. nodetool ring reported load as 575.38 GB

2. nodetool cfstats for the CF reported:
SSTable count: 28
Space used (live): 572671381955
Space used (total): 572671381955


3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
46

4. 'du -ksh .' in the data folder for CF returned
720G

The above numbers indicate that there are some sstables that are obsolete
and are still occupying space on disk. What could be wrong? Will restarting
the node help? The cassandra process is running for last 45 days with no
downtime. However, because the disk usage is high, we are not able to run
full compaction.

Also, I can't find reference to each of the sstables on disk in the
system.log file. For eg I have one data file on disk as (ls -lth):
86G Nov 20 06:14

I have system.log file with first line:
INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101)
Logging initialized

The 86G file must be a result of some compaction. I see no reference of
data file in system.log file between 11/18 to 11/25. What could be the
reason for that? The only reference is dated 11/29 when the file was being
streamed to another node (new node).

How can I identify the obsolete files and remove them? I am thinking about
following. Let me know if it make sense.
1. Restart the node and check the state.
2. Move the oldest data files to another location (to another mount point)
3. Restart the node again
4. Run repair on the node so that it can get the missing data from its
peers.


I compared the numbers of a healthy node for the same CF:
1. nodetool ring reported load as 662.95 GB

2. nodetool cfstats for the CF reported:
SSTable count: 16
Space used (live): 670524321067
Space used (total): 670524321067

3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
16

4. 'du -ksh .' in the data folder for CF returned
625G


-Naren



-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.aeris.com*
*http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*


Re: Cassandra 1.1.6 - New node bootstrap not completing

2013-11-01 Thread Narendra Sharma
I was successfully able to bootstrap the node. The issue was RF  2. Thanks
again Robert.


On Wed, Oct 30, 2013 at 10:29 AM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 Thanks Robert.

 I didn't realize that some of the keyspaces (not all and esp. the biggest
 one I was focusing on) had RF  2. I wasted 3 days on it. Thanks again for
 the pointers. I will try again and share the results.


 On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli rc...@eventbrite.comwrote:

 On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 We had a cluster of 4 nodes in AWS. The average load on each node was
 approx 750GB. We added 4 new nodes. It is now more than 30 hours and the
 node is still in JOINING mode.
 Specifically I am analyzing the one with IP 10.3.1.29. There is no
 compaction or streaming or index building happening.


 If your cluster has RF2, you are bootstrapping two nodes into the same
 range simultaneously. That is not supported. [1,2] The node you are having
 the problem with is in the range that is probably overlapping.

 If I were you I would :

 1) stop all Joining nodes and wipe their state including system keyspace
 2) optionally removetoken any nodes which remain in cluster gossip
 state after stopping
 3) re-start/bootstrap them one at a time, waiting for each to complete
 bootstrapping before starting the next  one
 4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-2434
 [2]
 https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13091851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13091851




 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com*
 *http://narendrasharma.blogspot.com/*




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com*
*http://narendrasharma.blogspot.com/*


Cassandra 1.1.6 - New node bootstrap not completing

2013-10-29 Thread Narendra Sharma
We had a cluster of 4 nodes in AWS. The average load on each node was
approx 750GB. We added 4 new nodes. It is now more than 30 hours and the
node is still in JOINING mode.
Specifically I am analyzing the one with IP 10.3.1.29. There is no
compaction or streaming or index building happening.

$ ./nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC  RackStatus State   LoadOwns
   Token

   148873535527910577765226390751398592512
10.3.1.179  datacenter1 rack1   Up Normal  740.41 GB
25.00%  0
10.3.1.29   datacenter1 rack1   Up Joining 562.49 GB
0.00%   21267647932558653966460912964485513215
10.3.1.175  datacenter1 rack1   Up Normal  755.7 GB
 25.00%  42535295865117307932921825928971026431
10.3.1.30   datacenter1 rack1   Up Joining 565.68 GB
0.00%   63802943797675961899382738893456539648
10.3.1.177  datacenter1 rack1   Up Normal  754.18 GB
25.00%  85070591730234615865843651857942052863
10.3.1.135  datacenter1 rack1   Up Normal  95.97 GB
 20.87%  120580289963820081458352857409882669785
10.3.1.178  datacenter1 rack1   Up Normal  747.53 GB
4.13%   127605887595351923798765477786913079295
10.3.1.24   datacenter1 rack1   Up Joining 522.09 GB
0.00%   148873535527910577765226390751398592512
$ ./nodetool netstats
Mode: JOINING
Not sending any streams.
 Nothing streaming from /10.3.1.177
 Nothing streaming from /10.3.1.179
Pool NameActive   Pending  Completed
Commandsn/a 0 82
Responses   n/a 0   40135123
$ ./nodetool compactionStats
pending tasks: 0
Active compaction remaining time :n/a
$ ./nodetool info
Token: 21267647932558653966460912964485513215
Gossip active: true
Thrift active: false
Load : 562.49 GB
Generation No: 1382981644
Uptime (seconds) : 90340
Heap Memory (MB) : 9298.59 / 13272.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 2
Key Cache: size 104857584 (bytes), capacity 104857584 (bytes),
187373 hits, 94709046 requests, 0.002 recent hit rate, 14400 save period in
seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds


The 2 Exceptions in info output are the ones that were logged when I
stopped index build to let bootstrap complete faster.

Any clue whats wrong and where should I look for to further analyze the
issue? I haven't restarted the Cassandra process. I am afraid the node will
start bootstrap again if I restart the node.

Thanks,
Naren



-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com*
*http://narendrasharma.blogspot.com/*


Re: Cassandra 1.1.6 - New node bootstrap not completing

2013-10-29 Thread Narendra Sharma
Thanks Robert.

I didn't realize that some of the keyspaces (not all and esp. the biggest
one I was focusing on) had RF  2. I wasted 3 days on it. Thanks again for
the pointers. I will try again and share the results.


On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 We had a cluster of 4 nodes in AWS. The average load on each node was
 approx 750GB. We added 4 new nodes. It is now more than 30 hours and the
 node is still in JOINING mode.
 Specifically I am analyzing the one with IP 10.3.1.29. There is no
 compaction or streaming or index building happening.


 If your cluster has RF2, you are bootstrapping two nodes into the same
 range simultaneously. That is not supported. [1,2] The node you are having
 the problem with is in the range that is probably overlapping.

 If I were you I would :

 1) stop all Joining nodes and wipe their state including system keyspace
 2) optionally removetoken any nodes which remain in cluster gossip state
 after stopping
 3) re-start/bootstrap them one at a time, waiting for each to complete
 bootstrapping before starting the next  one
 4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP.

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-2434
 [2]
 https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13091851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13091851




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com*
*http://narendrasharma.blogspot.com/*


Re: Querying for rows without a particular column

2012-02-14 Thread Narendra Sharma
This is an interesting usecase. If you implement it correctly then you may
end up getting all the rows in your cluster for certain bad queries :)...so
be careful.

I would ask why do you want to know such rows and what will you do with
them?

-Naren


On Mon, Feb 13, 2012 at 12:16 PM, Asankha C. Perera asan...@apache.orgwrote:

 Hi All

 I am using expiring columns in my column family, and need to search for
 the rows where a particular column expired (and no longer exists).. I am
 using Hector client. How can I make a query to find the rows of my interest?

 thanks
 asankha

 --
 Asankha C. Perera
 AdroitLogic, http://adroitlogic.org

 http://esbmagic.blogspot.com







-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Implications of length of column names

2012-02-10 Thread Narendra Sharma
It is good to have short column names. They save space all the way from
network transfer to in-memory usage to storage. It is also good idea to
club immutables columns that are read together and store as single column.
We gained significant overall performance benefits with this.

-Naren

On Fri, Feb 10, 2012 at 12:20 PM, Drew Kutcharian d...@venarc.com wrote:

 What are the implications of using short vs long column names? Is it
 better to use short column names or longer ones?

 I know for MongoDB you are better of using short field names
 http://www.mongodb.org/display/DOCS/Optimizing+Storage+of+Small+Objects
  Does this apply to Cassandra column names?


 -- Drew




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Deleting a column vs setting it's value to empty

2012-02-10 Thread Narendra Sharma
IMO deleting is always better. It is better to not store the column if
there is no value associated.

-Naren

On Fri, Feb 10, 2012 at 12:15 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Everyone,

 Let's say I have the following object which I would like to save in
 Cassandra:

 class User {
  UUID id; //row key
  String name; //columnKey: name, columnValue: the name of the user
  String description; //columnKey: description, columnValue: the
 description of the user
 }

 Description can be nullable. What's the best approach when a user updates
 her description and sets it to null? Should I delete the description column
 or set it to an empty string?

 In addition, if I go with the delete column strategy, since I don't know
 what was the previous value of description (the column could not even
 exist), what would happen when I delete a non existent column?

 Thanks,

 Drew




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Narendra Sharma
I believe you need to move the nodes on the ring. What was the load on the
nodes before you added 5 new nodes? Its just that you are getting data in
certain token range more than others.

-Naren

On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de
 wrote:

 On 18.01.2012, at 02:19, Maki Watanabe wrote:

 Are there any significant difference of number of sstables on each nodes?

 No, no significant difference there. Actually, node 8 is among those with
 more sstables but with the least load (20GB)

 On 17.01.2012, at 20:14, Jeremiah Jordan wrote:

 Are you deleting data or using TTL's?  Expired/deleted data won't go away
 until the sstable holding it is compacted.  So if compaction has happened
 on some nodes, but not on others, you will see this.  The disparity is
 pretty big 400Gb to 20GB, so this probably isn't the issue, but with our
 data using TTL's if I run major compactions a couple times on that column
 family it can shrink ~30%-40%.

 Yes, we do delete data. But I agree, the disparity is too big to blame
 only the deletions.

 Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks
 ago. After adding the node, we did
 compactions and cleanups and didn't have a balanced cluster. So that
 should have removed outdated data, right?

 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:

 We are running regular repairs, so I don't think that's the problem.

 And the data dir sizes match approx. the load from the nodetool.

 Thanks for the advise, though.


 Our keys are digits only, and all contain a few zeros at the same

 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it

 would generate 'hotspots' for those kind of keys, right?


 On 17.01.2012, at 17:34, Mohit Anchlia wrote:


 Have you tried running repair first on each node? Also, verify using

 df -h on the data dirs


 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach

 marcel.steinb...@chors.de wrote:


 Hi,



 we're using RP and have each node assigned the same amount of the token

 space. The cluster looks like that:



 Address Status State   LoadOwnsToken



 205648943402372032879374446248852460236


 1   Up Normal  310.83 GB   12.50%

  56775407874461455114148055497453867724


 2   Up Normal  470.24 GB   12.50%

  78043055807020109080608968461939380940


 3   Up Normal  271.57 GB   12.50%

  99310703739578763047069881426424894156


 4   Up Normal  282.61 GB   12.50%

  120578351672137417013530794390910407372


 5   Up Normal  248.76 GB   12.50%

  141845999604696070979991707355395920588


 6   Up Normal  164.12 GB   12.50%

  163113647537254724946452620319881433804


 7   Up Normal  76.23 GB12.50%

  184381295469813378912913533284366947020


 8   Up Normal  19.79 GB12.50%

  205648943402372032879374446248852460236



 I was under the impression, the RP would distribute the load more evenly.


 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single

 node. Should we just move the nodes so that the load is more even

 distributed, or is there something off that needs to be fixed first?



 Thanks


 Marcel


 hr style=border-color:blue


 pchors GmbH


 brhr style=border-color:blue


 pspecialists in digital and direct marketing solutionsbr


 Haid-und-Neu-Straße 7br


 76131 Karlsruhe, Germanybr


 www.chors.com/p


 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht

 Montabaur, HRB 15029/p


 p style=font-size:9pxThis e-mail is for the intended recipient only and

 may contain confidential or privileged information. If you have received

 this e-mail by mistake, please contact us immediately and completely delete

 it (and any attachments) and do not forward it or inform any other person
 of

 its contents. If you send us messages by e-mail, we take this as your

 authorization to correspond with you by e-mail. E-mail transmission cannot

 be guaranteed to be secure or error-free as information could be

 intercepted, amended, corrupted, lost, destroyed, arrive late or
 incomplete,

 or contain viruses. Neither chors GmbH nor the sender accept liability for

 any errors or omissions in the content of this message which arise as a

 result of its e-mail transmission. Please note that all e-mail

 communications to and from chors GmbH may be monitored./p






 --
 w3m





-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Narendra Sharma
It's very surprising that no one seems to have solved such a common use
case.
I would say people have solved it using RIGHT tools for the task.



On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian d...@venarc.com wrote:

 Thanks everyone for the replies. Seems like there is no easy way to handle
 this. It's very surprising that no one seems to have solved such a common
 use case.

 -- Drew

 On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:

  That's a good question, and I'm not sure - I'm fairly new to both ZK
  and Cassandra. I found this wiki page:
  http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
  and I think the lock recipe still works, even if a stale read happens.
  Assuming that wiki page is correct.
 
  There is still subtlety to locking with ZK though, see (Locks based
  on ephemeral nodes) from the zk mailing list in October:
 
 http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
 
  -Bryce
 
  On Fri, 6 Jan 2012 13:36:52 -0800
  Drew Kutcharian d...@venarc.com wrote:
  Bryce,
 
  I'm not sure about ZooKeeper, but I know if you have a partition
  between HazelCast nodes, than the nodes can acquire the same lock
  independently in each divided partition. How does ZooKeeper handle
  this situation?
 
  -- Drew
 
 
  On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
 
  On Fri, 6 Jan 2012 10:03:38 -0800
  Drew Kutcharian d...@venarc.com wrote:
  I know that this can be done using a lock manager such as ZooKeeper
  or HazelCast, but the issue with using either of them is that if
  ZooKeeper or HazelCast is down, then you can't be sure about the
  reliability of the lock. So this potentially, in the very rare
  instance where the lock manager is down and two users are
  registering with the same email, can cause major issues.
 
  For most applications, if the lock managers is down, you don't
  acquire the lock, so you don't enter the critical section. Rather
  than allowing inconsistency, you become unavailable (at least to
  writes that require a lock).
 
  -Bryce
 




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Narendra Sharma
Instead of trying to solve the generic problem of uniqueness, I would focus
on the specific problem.

For eg lets consider your usecase of user registration with email address
as key. You can do following:
1. Create CF (Users) where row key is UUID and has user info specific
columns.
2. Whenever user registers create a row in this CF with user status flag as
waiting for confirmation.
3. Send email to the user's email address with link that contains the UUID
(or encrypted UUID)
4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup
user
5. If the user exists with given UUID and status as waiting for
confirmation then update the status  and create a entry in another CF
(EmailUUIDIndex) representing email address to UUID mapping.
6. For authentication you can lookup in the index to get UUID and proceed.
7. If a malicious user registers with someone else's email id then he will
never be able to confirm and will never have an entry in EmailUUIDIndex. As
a additional check if the entry for email id exists in EmailUUIDIndex then
the request for registration can be rejected right away.

Make sense?

-Naren

On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian d...@venarc.com wrote:

 So what are the common RIGHT solutions/tools for this?


 On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:

 It's very surprising that no one seems to have solved such a common use
 case.
 I would say people have solved it using RIGHT tools for the task.



 On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian d...@venarc.com wrote:

 Thanks everyone for the replies. Seems like there is no easy way to
 handle this. It's very surprising that no one seems to have solved such a
 common use case.

 -- Drew

 On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:

  That's a good question, and I'm not sure - I'm fairly new to both ZK
  and Cassandra. I found this wiki page:
  http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
  and I think the lock recipe still works, even if a stale read happens.
  Assuming that wiki page is correct.
 
  There is still subtlety to locking with ZK though, see (Locks based
  on ephemeral nodes) from the zk mailing list in October:
 
 http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
 
  -Bryce
 
  On Fri, 6 Jan 2012 13:36:52 -0800
  Drew Kutcharian d...@venarc.com wrote:
  Bryce,
 
  I'm not sure about ZooKeeper, but I know if you have a partition
  between HazelCast nodes, than the nodes can acquire the same lock
  independently in each divided partition. How does ZooKeeper handle
  this situation?
 
  -- Drew
 
 
  On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
 
  On Fri, 6 Jan 2012 10:03:38 -0800
  Drew Kutcharian d...@venarc.com wrote:
  I know that this can be done using a lock manager such as ZooKeeper
  or HazelCast, but the issue with using either of them is that if
  ZooKeeper or HazelCast is down, then you can't be sure about the
  reliability of the lock. So this potentially, in the very rare
  instance where the lock manager is down and two users are
  registering with the same email, can cause major issues.
 
  For most applications, if the lock managers is down, you don't
  acquire the lock, so you don't enter the critical section. Rather
  than allowing inconsistency, you become unavailable (at least to
  writes that require a lock).
 
  -Bryce
 




 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.persistentsys.com/*
 *http://narendrasharma.blogspot.com/*






-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Cassandra memory usage

2012-01-03 Thread Narendra Sharma
See http://wiki.apache.org/cassandra/FAQ#mmap

Also, the discussion on
http://comments.gmane.org/gmane.comp.db.cassandra.user/14080

Hopefully these will answer your question.

-Naren

On Tue, Jan 3, 2012 at 12:53 PM, Daning Wang dan...@netseer.com wrote:

 I have Cassandra server which has JVM setting -Xms4G -Xmx4G, but why top
 reports 15G RES memory and 11G SHR memory usage? I understand that -Xmx4G
 is only for the heap size. but it is strange that OS reports 2.5 times
 memory usage. Are there a lot of memory used by JNI? Please help to explain
 this.

  cassy 2549 39.7 66.1 163805536 16324648 ?  Sl   Jan02 338:48
 /usr/local/cassy/java/current/bin/java -ea
 -javaagent:./../lib/jamm-0.2.2.jar -XX:+UseThreadPriorities
 -XX:ThreadPriorityPolicy=42* -Xms4G -Xmx4G 
 -Xmn1G*-XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=10 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=7199
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false -Dmx4jport=8085
 -Djava.rmi.server.hostname=10.210.101.106
 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true
 -Dpasswd.properties=./../conf/passwd.properties -cp
 ./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/antlr-3.2.jar:./../lib/apache-cassandra-0.8.6.jar:./../lib/apache-cassandra-thrift-0.8.6.jar:./../lib/avro-1.4.0-fixes.jar:./../lib/avro-1.4.0-sources-fixes.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commons-collections-3.2.1.jar:./../lib/commons-lang-2.4.jar:./../lib/concurrentlinkedhashmap-lru-1.1.jar:./../lib/guava-r08.jar:./../lib/high-scale-lib-1.1.2.jar:./../lib/jackson-core-asl-1.4.0.jar:./../lib/jackson-mapper-asl-1.4.0.jar:./../lib/jamm-0.2.2.jar:./../lib/jline-0.9.94.jar:./../lib/jna.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.6.jar:./../lib/log4j-1.2.16.jar:./../lib/mx4j-tools.jar:./../lib/servlet-api-2.5-20081211.jar:./../lib/slf4j-api-1.6.1.jar:./../lib/slf4j-log4j12-1.6.1.jar:./../lib/snakeyaml-1.6.jar
 org.apache.cassandra.thrift.CassandraDaemon


 Top

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
 COMMAND

   2549 cassy 21   0  156g * 15g  11g *S 66.9 65.5 338:02.72 java


 Thank you in advance,


 Daning




-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: How to convert start_token,end_token to real key value?

2012-01-01 Thread Narendra Sharma
A token is a MD5 hash (one way hash). You cannot compute the key given a
token. You can however compute MD5 hash of your keys and compare them with
tokens.

-Naren

On Sat, Dec 31, 2011 at 2:07 PM, ravikumar visweswara talk2had...@gmail.com
 wrote:

 Hello All,

 I have requirement to copy data from cassandra to hadoop from/to a
 specific key. This is supported in 1.0.0.  But I am using cassandra version
 0.7.1 and hadoop version 20.2.

 In my mapreduce job(InputFormat class) i have an object of TokenRange. I
 need to filter certain ranges based on some exclusion rules.
 i have readable key range to include. Could some one help me on how to
 convert start_token and end_token to readable format and compare with my
 input keys (range)?

 I know that 1.0.0 have better capabilities to specify keyRanges in hadoop
 mapreduce. But for now, i will have to work with 0.7.1

 Thanks and Regards
 Ravi





-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Unable to add columns to empty row in Column family: Cassandra

2011-05-12 Thread Narendra Sharma
Can u share the code?

On Mon, May 2, 2011 at 11:34 PM, anuya joshi anu...@gmail.com wrote:

 Hello,

 I am using Cassandra for my application.My Cassandra client uses Thrift
 APIs directly. The problem I am facing currently is as follows:

 1) I added a row and columns in it dynamically via Thrift API Client
 2) Next, I used command line client to delete row which actually deleted
 all the columns in it, leaving empty row with original row id.
 3) Now, I am trying to add columns dynamically using client program into
 this empty row with same row key
 However, columns are not being inserted.
 But, when tried from command line client, it worked correctly.

 Any pointer on this would be of great use

 Thanks in  advance,

 Regards,
 Anuya




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: network topology issue

2011-05-11 Thread Narendra Sharma
My understanding is that the replication factor is for the entire ring. Even
if you have 2 DCs the nodes are part of the same ring. What you get
additionally from NTS is that you can specify how many replicas to place in
each DC.

So RF = 1 and DC1:1, DC2:1 looks incorrect to me.

What is possible with NTS is following:
RF=3, DC1=1, DC2=2

Would wait for others comments to see if my understand is correct.

-Naren

On Wed, May 11, 2011 at 5:41 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Thanks Sameer for your answer.
 I am using two DCs DC1 , DC2 with both having one node each, my
 straegy_options values are DC1:1,DC2:1  I am not sure what my RF should be ,
 should it be 1 or 2?
 Please Advise
 Thanks
 Anurag


 On Wed, May 11, 2011 at 5:27 PM, Sameer Farooqui 
 cassandral...@gmail.comwrote:

 Anurag,

 The Cassandra ring spans datacenters, so you can't use token 0 on both
 nodes. Cassandra’s ring is from 0 to 2**127 in size.

 Try assigning one node the token of 0 and the second node 8.50705917 ×
 10^37 (input this as a single long number).

 To add a new keyspace in 0.8, run this from the CLI:
 create keyspace KEYSPACENAME with placement_strategy =
 org.apache.Cassandra.locator.NetworkTopologyStrategy' and strategy_options =
 [{replication_factor:2}];

 If using 0.7, run help create keyspace; from the CLI and it'll show you
 the correct syntax.


 More info on tokens:

 http://journal.paul.querna.org/articles/2010/09/24/cassandra-token-selection/
 http://journal.paul.querna.org/articles/2010/09/24/cassandra-token-selection/
 http://wiki.apache.org/cassandra/Operations#Token_selection


 On Wed, May 11, 2011 at 4:58 PM, Anurag Gujral 
 anurag.guj...@gmail.comwrote:

 Hi All,
  I am testing network topology strategy in cassandra I am
 using two nodes , one node each in different data center.
 Since the nodes are in different dc I assigned token 0 to both the nodes.
 I added both the nodes as seeds in the cassandra.yaml and  I am  using
 properyfilesnitch as endpoint snitch where I have specified the colo
 details.

 I started first node then I when I restarted second node I got an error
 that token 0 is already being used.Why am I getting this error.

 Second Question: I already have cassandra running in two different data
 centers I want to add a new keyspace which uses networkTopology strategy
 in the light of above errors how can I accomplish this.


 Thanks
 Anurag






-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Newbie question

2011-05-10 Thread Narendra Sharma
You can have only one ordering defined in a CF. Super CF will allow you to
have nested ordering i.e. SC can have one ordering whereas columns within SC
can have other ordering. Note this is defined at CF level and cannot be
defined at SC level.

To model what you are trying to do, you can check if secondary indexes will
be useful (assuming you have standard CF). If not you can create another CF
that will just keep NAME as column name and ID as column value. This will
ensure ordering by NAME and pointer to original column (or SC depending on
your schema). The downside is you will need to run 2 queries to get the
data.

-Naren

On Tue, May 10, 2011 at 6:33 AM, Sam Ganesan sam.gane...@motorola.comwrote:

 All:

 A newbie question to the aficianados.  I understand that I can stipulate an
 ordering mechanism when I create a column family to reflect what I am
 querying in the long run.  Generally I need to query a particular column
 space that I am contructing based on two different columns.  The frequency
 of these queries is not that different from each other.  I query based on a
 numberical ID or a name with equal frequency.

 What is the recommended way of approaching this problem

 Regards

 Sam
 *__
 Sam Ganesan Ph.D.
 Distinguished member, Technical Staff
 Motorola Mobility - On Demand Video
 900 Chelmsford Street,
 Lowell, MA 01851
 tel:+1 978 614-3165  (changed)
 mob:+1 978 328-7132
 mailto: sam.gane...@motorola.com*




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: cassandra not reading keyspaces defined in cassandra.yaml

2011-05-09 Thread Narendra Sharma
Look for Where are my keyspaces? on following page:
*http://wiki.apache.org/cassandra/StorageConfiguration
*
On Mon, May 9, 2011 at 5:51 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
I have following in my cassandra.yaml
 keyspaces:
 - column_families:
   - column_metadata: []
 column_type: Standard
 compare_with: BytesType
 gc_grace_seconds: 86400
 key_cache_save_period_in_seconds: 14400
 keys_cached: 0.0
 max_compaction_threshold: 32
 memtable_flush_after_mins: 1440
 memtable_operations_in_millions: 100.0
 memtable_throughput_in_mb: 256
 min_compaction_threshold: 4
 name: data
 read_repair_chance: 1.0
 row_cache_save_period_in_seconds: 0
 rows_cached: 1000
   name: offline
   replica_placement_strategy:
 org.apache.cassandra.locator.RackUnawareStrategy
   replication_factor: 1

 Cassandra starts properly without giving any warnngs/error but does not
 create the keyspace offline
 which is defined above.

 Please suggest.

 Thanks
 Anurag




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread Narendra Sharma
At t8 The request would not start as the CL level of nodes is not
available, the write would not be written to node X. The client would get an
UnavailableException. In response it should connect to a new coordinator and
try again.
[Naren] There may (and most likely there will  be) be a window when CL will
be satisfied while write will still fail because the node is actually down.
There are lot of possible scenarios here. I believe Milind is talking about
some extreme but likely cases.



On Sat, Apr 23, 2011 at 7:31 PM, aaron morton aa...@thelastpickle.comwrote:

 Have not read the whole thing just the time line. Couple of issues...

 At t8 The request would not start as the CL level of nodes is not
 available, the write would not be written to node X. The client would get an
 UnavailableException. In response it should connect to a new coordinator and
 try again.

 At t12 if RR is enabled for the request the read is sent to all UP
 endpoints for the key. Once CL requests have returned (including the data /
 non digest request) the responses are repaired and a synchronous (to the
 read request) RR round is initiated.

 Once all the requests have responded they are compared again an async RR
 process is kicked off. So it seems that in a worse case scenario two round
 of RR are possible, one to make sure the correct data is returned for the
 request. And another to make sure that all UP replicas agree, as it may not
 be the case that all UP replicas were involved in completing the request.

 So as written, at t8 the write would have failed and not be stored on any
 nodes. So the write at t7 would not be lost.

 I think the crux of this example is the failure mode at t8, I'm assuming
 Alice is connected to node x:

 1) if X is disconnected before the write starts, it will not start any
 write that requires Quorum CL. Write fails with Unavailable error.
 2) If X disconnects from the network *after* sending the write messages,
 and all messages are successfully  actioned (including a local write) the
 request will fail with a TimedOutException as  CL nodes will respond.
 3) If X disconnects from the cluster after sending the messages, and the
 messages it  sends are lost but the local write succeeds. The request will
 fail with a TimedOutException as  CL nodes will respond.

 In all these cases the request is considered to have failed. The client
 should connect to another node and try again. In the case of timeout the
 operation was not completed to the CL level you asked for. In the case of
 unavailable the operation was not started.

 It can look like the RR conflict resolution is a little naive here, but
 it's less simple when you consider another scenario. The write at t8 failed
 at Quorum, and in your deployment the client cannot connect to another node
 in the cluster, so your code drops the CL down to ONE and gets the write
 done. You are happy that any nodes in Alice's partition see her write, and
 that those in Bens partition see he's. When things get back to normal you
 want the most recent write to what clients consistently see, not the most
 popular value. The Consistency section here
 http://wiki.apache.org/cassandra/ArchitectureOverview says the same, it's
 the most recent value.

 I tend to think of Consistency as all clients getting the same response to
 the same query.

 Not sure if I've made things clearer, feel free to poke holes in my logic
 :)

 Hope that helps.
 Aaron


 On 23 Apr 2011, at 09:02, Edward Capriolo wrote:

 On Fri, Apr 22, 2011 at 4:31 PM, Milind Parikh milindpar...@gmail.com
 wrote:

 Is there a chance of getting manual conflict resolution in Cassandra?

 Please see attachment for why this is important in some cases.


 Regards

 Milind




 I think about this often. LDAP servers like SunOne have pluggable
 conflict resolution. I could see the read-repair algorithm being
 pluggable.





-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: seed faq

2011-04-21 Thread Narendra Sharma
Here are some more details that might help:
1. You are right that Seeds are referred on startup to learn about the ring.
2. It is a good idea to have more than 1 seed. Seed is not SPoF. Remember
Gossip also provides eventual consistency. So if seed is missing, the new
node may not have the correct view of the ring. However, after talking to
other nodes it will eventually have the uptodate state of the ring.
3. In an iteration Gossiper on a node sends gossip message
 - To a known live node (picked randomly)
 - To a known dead node (based on some probability)
 - To a seed node (based on some probability)

Thanks,
Naren

On Wed, Apr 20, 2011 at 7:13 PM, Maki Watanabe watanabe.m...@gmail.comwrote:

 I made self answered faqs on seed after reading the wiki and code.
 If I misunderstand something, please point out to me.

 == What are seeds? ==

 Seeds, or seed nodes are the nodes which new nodes refer to on
 bootstrap to know ring information.
 When you add a new node to ring, you need to specify at least one live
 seed to contact. Once a node join the ring, it learns about the other
 nodes, so it doesn't need seed on subsequent boot.

 There is no special configuration for seed node itself. In stable and
 static ring, you can point non-seed node as seed on bootstrap though
 it is not recommended.
 Nodes in the ring tend to send Gossip message to seeds more often by
 design, so it is probable that seeds have most recent and updated
 information of the ring. ( Refer to [[ArchitectureGossip]] for more
 details )

 == Does single seed mean single point of failure? ==

 If you are using replicated CF on the ring, only one seed in the ring
 doesn't mean single point of failure. The ring can operate or boot
 without the seed. But it is recommended to have multiple seeds in
 production system to maintain the ring.



 Thanks
 --
 maki




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Starting the Cassandra server from Java (without command line)

2011-04-14 Thread Narendra Sharma
The write up is a year old but still will give you fair idea of how to do.

http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/

http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/
Thanks,
Naren

On Thu, Apr 14, 2011 at 10:59 AM, sam_ amin_shar...@yahoo.com wrote:

 Hello there,

 To start the Cassandra server we can use the following command in command
 prompt:
 cassandra -f

 I am wondering if it is possible to directly start the server inside a Java
 program using thrift API or a lower level class inside Cassandra
 implementation.

 The purpose of this is to be able to run JUnit tests that need to start
 Cassandra server in SetUp(), without the need to create a process and run
 cassandra from command line.

 Thanks,
 Sam

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Starting-the-Cassandra-server-from-Java-without-command-line-tp6273826p6273826.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Cassandra 2 DC deployment

2011-04-12 Thread Narendra Sharma
I think this is reasonable assuming you have enough backhaul to perform
reads across DC if read requests hit DC2 (with one copy of data) or one
replica from DC1 is down.

Moreover, since you clearly stated that you would prefer availability over
consistency, you should be prepared for stale reads :)


On Tue, Apr 12, 2011 at 8:12 AM, Raj N raj.cassan...@gmail.com wrote:

 Hi experts,
  We are planning to deploy Cassandra in 2 datacenters. Let assume there
 are 3 nodes, RF=3, 2 nodes in 1 DC and 1 node in 2nd DC. Under normal
 operations, we would read and write at QUORUM. What we want to do though is
 if we lose a datacenter which has 2 nodes, DC1 in this case, we want to
 downgrade our consistency to ONE. Basically I am saying that whenever there
 is a partition, then prefer availability over consistency. In order to do
 this we plan to catch UnavailableException and take corrective action. So
 try QUORUM under normal circumstances, if unavailable try ONE. My questions
 -
 Do you guys see any flaws with this approach?
 What happens when DC1 comes back up and we start reading/writing at QUORUM
 again? Will we read stale data in this case?

 Thanks
 -Raj




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: weird error when connecting to cassandra mbean proxy

2011-04-07 Thread Narendra Sharma
The correct object name is org.apache.cassandra.db:type=StorageProxy

-Naren

On Thu, Apr 7, 2011 at 4:36 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
  I have written a code for connecting to mbean server runnning on cassandra
 node.
 I get the following error:
 Exception in thread main java.lang.reflect.UndeclaredThrowableException
 at $Proxy1.getReadOperations(Unknown Source)
 at
 com.smeet.cassandra.CassandraJmxHttpServerMy.init(CassandraJmxHttpServerMy.java:72)
 at
 com.smeet.cassandra.CassandraJmxHttpServerMy.main(CassandraJmxHttpServerMy.java:77)
 Caused by: javax.management.InstanceNotFoundException:
 org.apache.cassandra.service:type=StorageProxy
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118)
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:679)
 at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:672)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)

 I have attached the code file.

 Cassandra is running on the port I am trying to connect to .

 Please Suggest
 Thanks
 Anurag




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: old JMX code is not working with new cassandra version

2011-04-05 Thread Narendra Sharma
I think you need to specify the port in the JMXServiceURL. The exception
indicates there is no service listening on given host and port. Also, I
guess, based on 127.0.0.1, you are running the client on same m/c as
Cassandra. If that is not the case then fix the host as well. You might want
to look at the cassandra-env.sh file and comments it in.



On Tue, Apr 5, 2011 at 5:56 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
   I had written code for cassandra 0.6.3 using JMX to call
 compaction,when I try to use that code to connect to 0.7.3 I get the
 following
 error
 Exception in thread main java.rmi.ConnectException: Connection refused to
 host: 127.0.0.1; nested exception is:
 java.net.ConnectException: Connection refused
 at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
 at
 sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
 at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
 at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
 at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown
 Source)
 at
 javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2327)
 at
 javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
 at
 javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
 at com.bluekai.Client.doCompaction(Client.java:51)
 at com.bluekai.Client.main(Client.java:41)
 Caused by: java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
 at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
 at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 at java.net.Socket.connect(Socket.java:525)
 at java.net.Socket.connect(Socket.java:475)
 at java.net.Socket.init(Socket.java:372)
 at java.net.Socket.init(Socket.java:186)
 at
 sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
 at
 sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
 at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595

 Any suggestions
 Thanks
 Anurag

 I am pasting below the structure of the code I am using,it is giving above
 error on when JMXConnectorFactory.connect is called.

 MXServiceURL url =
 new JMXServiceURL(service:jmx:rmi:///jndi/rmi:// + host +
 /jmxrmi);

 System.out.println(before connection=host:+host);
 JMXConnector jmxc = JMXConnectorFactory.connect(url, null);
 System.out.println(After connection);
 // Get an MBeanServerConnection
 MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();

 // Construct the ObjectName for the QueueSampler MXBean
 ObjectName mxbeanName =
 new
 ObjectName(org.apache.cassandra.db:type=ColumnFamilyStores,keyspace=+keyspace+,columnfamily=+columnfamily);

 // Create a dedicated proxy for the MXBean instead of
 // going directly through the MBean server connection
 ColumnFamilyStores mxbeanProxy =
 JMX.newMXBeanProxy(mbsc, mxbeanName, ColumnFamilyStores.class);
 mxbeanProxy.forceMajorCompaction();
 jmxc.close();




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Understanding cfhistogram output

2011-04-01 Thread Narendra Sharma
There are 6 columns in the output.
*- Offset*
 This is the buckets. Same as values on X-axis in a graph. The unit is
determined based on the other columns.
*- SSTables*
 This represents the number of sstables accessed per read. For eg if a
read operation involved accessing 3 sstables then you will find a +ve
against offset 3. Most of the times the values will be against lower offset
values.
*- Write Latency *
 This represents the number of operations and their latency (micro
seconds). If 100 operations took say 5 ms then you will find an entry
against offset 5. This shows the distribution of number of operations across
a range of latency
*- Read Latency*
 Similar to write latency. The unit is microseconds.
*- Row Size*
 This represents the number of rows with given size. How many rows of
given size exist.
*- Column Count*
 Similar to row size. This represents the column count. How many rows with
given number of columns exist.

1. Note that these are estimates and not exact numbers.
2. The values ofcourse change over period of time.


On Fri, Apr 1, 2011 at 10:21 AM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
 I ran nodetool with cfhistogram I dont fully understand the
 output.Can someone please shower some light on it.
 Thanks
 Anurag




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Fatal error from a cassandra node

2011-03-30 Thread Narendra Sharma
http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size



On Wed, Mar 30, 2011 at 11:41 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

 I have 6 node cassandra cluster all are setup with same
  configurationI am getting fatal exceptions in one of the nodes
  ERROR [Thread-604] 2011-03-29 20:19:13,218 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-604,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [Thread-607] 2011-03-29 19:47:29,272 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-607,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [Thread-605] 2011-03-29 19:38:09,081 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-605,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [MutationStage:2] 2011-03-29 19:37:16,659
  DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
  java.lang.OutOfMemoryError: Java heap space
  ERROR [GossipStage:1] 2011-03-29 20:27:29,898
 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[GossipStage:1,5,main]
  java.lang.OutOfMemoryError: Java heap space
 
  All all the nodes have 32 G of ram.
 
  Everytime I try to restart the failed node I get the above errors.

 Unless something is outright wrong, it sounds like you need to
 increase your JVM heap size in cassandra-env.sh. That you're getting
 it on start-up sounds consistent with commit log reply filling the
 heap in the form of memtables that are sized too big for your heap.

 There's a wiki page somewhere that describes the overall rule of thumb
 for heap sizing, but I can't find it right now.

 --
 / Peter Schuller




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: difference between compaction, repair, clean

2011-03-30 Thread Narendra Sharma
Short answers:
- compaction - Initiate immediate full compaction. Removes deleted data.
- clean - Initiates immediate cleanup i.e. remove the data that is deleted
and that doesn't belong to this node. Internally performs full compaction.
- repair - Used to make different copies (replicas) of data consistent by
exchanging data with with other replicas.

The details on following links should be good to understand them in detail:
http://www.datastax.com/docs/0.7/utilities/nodetool
http://www.datastax.com/docs/0.7/utilities/nodetool
http://wiki.apache.org/cassandra/NodeProbe
http://wiki.apache.org/cassandra/NodeProbe
http://wiki.apache.org/cassandra/Operations

http://wiki.apache.org/cassandra/OperationsThanks,
Naren

On Wed, Mar 30, 2011 at 12:57 PM, Jonathan Colby
jonathan.co...@gmail.comwrote:

 I'm a little unclear on the differences between the nodetool operations:

 - compaction
 - repair
 - clean

 I understand that compaction consolidates the SSTables and physically
 performs deletes by taking into account the Tombstones.  But what does clean
 and repair do then?






-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Fatal error from a cassandra node

2011-03-30 Thread Narendra Sharma
OOM at startup with 16GB... seems like an issue. Which version are you
using? Can you provide some details on failed node? What exactly happened?
That might give some clue. Also, you might want to start with log level set
to debug to findout what more on what exactly Cassandra is doing that is
causing OOM.

-Naren

On Wed, Mar 30, 2011 at 4:45 PM, Anurag Gujral anurag.guj...@gmail.comwrote:


 I am using 16G of heap space how much more should i increase.
 Please suggest

 Thanks
 Anurag

 On Wed, Mar 30, 2011 at 11:43 AM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size



 On Wed, Mar 30, 2011 at 11:41 AM, Peter Schuller 
 peter.schul...@infidyne.com wrote:

 I have 6 node cassandra cluster all are setup with same
  configurationI am getting fatal exceptions in one of the nodes
  ERROR [Thread-604] 2011-03-29 20:19:13,218 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-604,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [Thread-607] 2011-03-29 19:47:29,272 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-607,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [Thread-605] 2011-03-29 19:38:09,081 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[Thread-605,5,main]
  java.lang.OutOfMemoryError: Java heap space
  ERROR [MutationStage:2] 2011-03-29 19:37:16,659
  DebuggableThreadPoolExecutor.java (line 103) Error in
 ThreadPoolExecutor
  java.lang.OutOfMemoryError: Java heap space
  ERROR [GossipStage:1] 2011-03-29 20:27:29,898
 AbstractCassandraDaemon.java
  (line 114) Fatal exception in thread Thread[GossipStage:1,5,main]
  java.lang.OutOfMemoryError: Java heap space
 
  All all the nodes have 32 G of ram.
 
  Everytime I try to restart the failed node I get the above errors.

 Unless something is outright wrong, it sounds like you need to
 increase your JVM heap size in cassandra-env.sh. That you're getting
 it on start-up sounds consistent with commit log reply filling the
 heap in the form of memtables that are sized too big for your heap.

 There's a wiki page somewhere that describes the overall rule of thumb
 for heap sizing, but I can't find it right now.

 --
 / Peter Schuller




 --
 Narendra Sharma
 Solution Architect
 *http://www.persistentsys.com*
 *http://narendrasharma.blogspot.com/*






-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Cassandra error Insufficient space to compact

2011-03-30 Thread Narendra Sharma
The space referred in the log message is disk space and not heap. So check
if you are running low on disk space.

Thanks,
Naren

On Wed, Mar 30, 2011 at 4:55 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
   I am getting following message from cassandra

  WARN [CompactionExecutor:1] 2011-03-30 18:46:33,272 CompactionManager.java
 (line 406) insufficient space to compact all requested files SSTableReader(

 I am using 16G of java heap space ,please let me know should I consider
 this as a sign of  something which I need to worry about.
 Thanks
 Anurag




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: cassandra client sample code for 0.7.3

2011-03-25 Thread Narendra Sharma
Hope you find following useful. It uses raw thirft. In case you find
difficulty in build and/or running the code, please reply back.

private Cassandra.Client createClient(String host, int port) {
TTransport framedTransport = new TFramedTransport(new TSocket(host,
port));
TProtocol framedProtocol = new TBinaryProtocol(framedTransport);
Cassandra.Client client = new Cassandra.Client(framedProtocol);
framedTransport.open();
client.set_keyspace(Keyspace);
return client;
}

private Mutation getMutation(SuperColumn sc) {
ColumnOrSuperColumn csc = new ColumnOrSuperColumn();
csc.setSuper_column(sc);
csc.setSuper_columnIsSet(true);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(csc);
m.setColumn_or_supercolumnIsSet(true);
return m;
}

private Mutation getMutation(Column c) {
ColumnOrSuperColumn csc = new ColumnOrSuperColumn();
csc.setColumn(c);
csc.setColumnIsSet(true);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(csc);
m.setColumn_or_supercolumnIsSet(true);
return m;
}


private Column createColumn(String name, String value, long time) {
Column c = new Column();
c.setName(name.getBytes());
c.setValue(value.getBytes());
c.setTimestamp(time);
return c;
}

Cassandra.Client client = createClient(host, port);
long timeStamp = System.currentTimeMillis();

//For Standard CF
Column col1 = createColumn(name1, value1, timeStamp);
Column col2 = createColumn(name2, value2, timeStamp);

MapString, ListMutation mutations = new HashMapString,
ListMutation();
ListMutation mutation = new ArrayListMutation();
mutation.add(getMutation(col1));
mutation.add(getMutation(col2));

mutations.put(StandardCF, mutation);
MapByteBuffer, MapString, ListMutation mutationMap = new
HashMapByteBuffer, MapString, ListMutation();
mutationMap.put(ByteBuffer.wrap(getBytes(rowkey)), mutations);
client.batch_mutate(mutationMap, CL);


//for Super CF
SuperColumn info = new SuperColumn();
info.setName(info);
ListColumn cols = new ArrayListColumn();
cols.add(createColumn(name1, val1, timeStamp));
cols.add(createColumn(name2, val2, timeStamp));
info.setColumns(cols);


MapString, ListMutation mutations = new HashMapString,
ListMutation();
ListMutation mutation = new ArrayListMutation();
mutation.add(getMutation(info));

mutations.put(SuperCF, mutation);
MapByteBuffer, MapString, ListMutation mutationMap = new
HashMapByteBuffer, MapString, ListMutation();
mutationMap.put(ByteBuffer.wrap(getBytes(row-key)), mutations);
client.batch_mutate(mutationMap, CL);

Thanks,
Naren


On Thu, Mar 24, 2011 at 10:01 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 I am in need of sample code(basically cassandra client) in java using
 batch_mutate
 If someone has please reply back
 Thanks
 Anurag



Option for ordering columns by timestamp in CF

2011-03-24 Thread Narendra Sharma
Cassandra 0.7.4
Column names in my CF are of type byte[] but I want to order columns by
timestamp. What is the best way to achieve this? Does it make sense for
Cassandra to support ordering of columns by timestamp as option for a column
family irrespective of the column name type?

Thanks,
Naren


Re: ParNew (promotion failed)

2011-03-23 Thread Narendra Sharma
I think it is due to fragmentation in old gen, due to which survivor area
cannot be moved to old gen. 300MB data size of memtable looks high for 3G
heap. I learned that in memory overhead of memtable can be as high as 10x of
memtable data size in memory. So either increase the heap or reduce the
memtable thresholds further so that old gen gets freed up faster. With
16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable
thresholds further.

-Naren

On Wed, Mar 23, 2011 at 8:18 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hello

 Sometimes i seen in gc log follow message:

 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion
 failed)
 Desired survivor size 41943040 bytes, new threshold 2 (max 2)
 - age   1:5573024 bytes,5573024 total
 - age   2:5064608 bytes,   10637632 total
 : 672577K-670749K(737280K), 0.1837950 secs]14897.288: [CMS:
 1602487K-779310K(2326528K), 4.7525580 secs] 2270940K-779310K(3063808K), [
 CMS Perm : 20073K-19913K(33420K)], 4.9365810 secs] [Times: user=5.06
 sys=0.00, real=4.93 secs]
 Total time for which application threads were stopped: 4.9378750 seconds


 How can i minimize they frequency, or disable?

 May current workload is a many small objects (about 200 bytes long), and
 summary of my memtables about 300 MB (16 CF). My heap is 3G,



Re: ParNew (promotion failed)

2011-03-23 Thread Narendra Sharma
I understand that. The overhead could be as high as 10x of memtable data
size. So overall the overhead for 16CF collectively in your case could be
300*10 = 3G.

Thanks,
Naren

On Wed, Mar 23, 2011 at 11:18 AM, ruslan usifov ruslan.usi...@gmail.comwrote:



 2011/3/23 Narendra Sharma narendra.sha...@gmail.com

 I think it is due to fragmentation in old gen, due to which survivor area
 cannot be moved to old gen. 300MB data size of memtable looks high for 3G
 heap. I learned that in memory overhead of memtable can be as high as 10x of
 memtable data size in memory. So either increase the heap or reduce the
 memtable thresholds further so that old gen gets freed up faster. With
 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable
 thresholds further.


 I think that you don't undestend me, 300MB is a summary thresholds on all
 16 CF, so one memtable_threshold is about 18MB. Or all the same it is
 necessary to reduce memtable_threshold?


Re: ParNew (promotion failed)

2011-03-23 Thread Narendra Sharma
I haven't used G1. I remember someone shared his experience in detail on G1.
The bottom line is you need to test it for your deployment and based on test
and results conclude if it will work for you. I believe for a small heap G1
will do well.

-Naren


On Wed, Mar 23, 2011 at 1:47 PM, ruslan usifov ruslan.usi...@gmail.comwrote:



 2011/3/23 Narendra Sharma narendra.sha...@gmail.com

 I understand that. The overhead could be as high as 10x of memtable data
 size. So overall the overhead for 16CF collectively in your case could be
 300*10 = 3G.


 And how about G1 GC, it must prevent memory fragmentation. but some post
 on this email, told that it is not so good as it described. What do you
 think about it?



Re: How to find what node a key is on

2011-03-23 Thread Narendra Sharma
The logic to find the node is not complicated. You compute the MD5 hash of
the key. Create sorted list of tokens assigned to the nodes in the ring.
 Find the first token greater than the hash. This is the first node. Next in
the list is the replica, which depends on the RF. Now this is simple because
this assumes SimpleStrategy for replica placement. For other strategies
finding replicas will be more involved.

Cassandra is a distributed databases. Each node is aware of the state of the
cluster and token distribution. Moving the logic into client is possible but
the benefits are way less compared to pain. At the same time doing it for a
large cluster would be more painful.

I would discourage you from going that route.

Thanks,
Naren

On Wed, Mar 23, 2011 at 5:16 PM, Sameer Farooqui cassandral...@gmail.comwrote:

 No problems with read performance, just curious about what kind of overhead
 was being added b/c we're doing read tests.

 If it's easy to figure out where the row is stored, I'd be interested in
 trying it. If not, don't worry about it.

 - Sameer



 On Wed, Mar 23, 2011 at 4:31 PM, aaron morton aa...@thelastpickle.comwrote:

 Each row is stored on RF nodes, and your read will be sent to CL number of
 nodes. Messages only take a single hop from the coordinator to each node the
 read is performed on, so the networking overhead varies with the number of
 nodes involved in the request.  There are man factors other than networking
 that influence the speed of a read request.

 There are features available to determine which nodes holds replicas for a
 particular key. AFAIK they are not intended for use by clients.

 Are you currently having problems with read performance ?

 Hope that helps.
 Aaron


 On 24 Mar 2011, at 11:53, Sameer Farooqui wrote:

 Does anybody know if it's possible to find out what node a specific
 key/row lives on?

 We have a 30 node cluster and I'm curious how much faster it'll be to read
 data directly from the node that stores the data.

 We're using random partitioner, by the way.


 *Sameer Farooqui
 *Accenture Technology Labs






Re: getting exception when cassandra 0.7.3 is starting

2011-03-17 Thread Narendra Sharma
Is this new install or upgrade?

Thanks,
Naren

On Wed, Mar 16, 2011 at 11:15 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 I am getting exception when starting cassandra 0.7.3

 ERROR 01:10:48,321 Exception encountered during startup.
 java.lang.NegativeArraySizeException
 at
 org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274)
 at
 org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:213)
 at
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
 at
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447)
 at org.apache.cassandra.db.Table.initCf(Table.java:317)
 at org.apache.cassandra.db.Table.init(Table.java:254)
 at org.apache.cassandra.db.Table.open(Table.java:110)
 at
 org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129)
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
 at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
 Exception encountered during startup.
 java.lang.NegativeArraySizeException
 at
 org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274)
 at
 org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:213)
 at
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
 at
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447)
 at org.apache.cassandra.db.Table.initCf(Table.java:317)
 at org.apache.cassandra.db.Table.init(Table.java:254)
 at org.apache.cassandra.db.Table.open(Table.java:110)
 at
 org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129)
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
 at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)





Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
What heap size are you running with? and Which version of Cassandra?

Thanks,
Naren

On Thu, Mar 17, 2011 at 3:45 AM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hello

 Some times i have very long GC pauses:


 Total time for which application threads were stopped: 0.0303150 seconds
 2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew:
 678855K-20708K(737280K), 0.0271230 secs] 1457643K-806795K(4112384K),
 0.027305
 0 secs] [Times: user=0.33 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.0291820 seconds
 2011-03-17T13:20:32.962+0300: 2.157: [GC 2.157: [ParNew:
 676068K-23527K(737280K), 0.0302180 secs] 1462155K-817599K(4112384K),
 0.030402
 0 secs] [Times: user=0.31 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.1270270 seconds
 2011-03-17T13:21:11.908+0300: 33371.103: [GC 33371.103: [ParNew:
 678887K-21564K(737280K), 0.0268160 secs] 1472959K-823191K(4112384K),
 0.027011
 0 secs] [Times: user=0.28 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.0293330 seconds
 2011-03-17T13:21:50.482+0300: 33409.677: [GC 33409.677: [ParNew:
 676924K-21115K(737280K), 0.0281720 secs] 1478551K-829900K(4112384K),
 0.028363
 0 secs] [Times: user=0.27 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.0339610 seconds
 2011-03-17T13:22:32.849+0300: 33452.044: [GC 33452.044: [ParNew:
 676475K-25948K(737280K), 0.0317600 secs] 1485260K-842061K(4112384K),
 0.031952
 0 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.0344430 seconds
 2011-03-17T13:23:14.924+0300: 33494.119: [GC 33494.119: [ParNew:
 681308K-25087K(737280K), 0.0282600 secs] 1497421K-848300K(4112384K),
 0.028436
 0 secs] [Times: user=0.32 sys=0.00, real=0.03 secs]
 Total time for which application threads were stopped: 0.0309160 seconds
 2011-03-17T13:23:57.192+0300: 33536.387: [GC 33536.387: [ParNew:
 680447K-24805K(737280K), 0.0299910 secs] 1503660K-855829K(4112384K),
 0.030167
 0 secs] [Times: user=0.29 sys=0.01, real=0.03 secs]
 Total time for which application threads were stopped: 0.0324200 seconds
 2011-03-17T13:24:01.553+0300: 33540.748: [GC 33540.749: [ParNew:
 680165K-31886K(737280K), 0.0495620 secs] 1511189K-936503K(4112384K),
 0.049742
 0 secs] [Times: user=0.57 sys=0.00, real=0.05 secs]
 Total time for which application threads were stopped: 0.0507030 seconds
 2011-03-17T13:37:56.009+0300: 34375.204: [GC 34375.204: [ParNew:
 687246K-28727K(737280K), 0.0244720 secs] 1591863K-942459K(4112384K),
 0.024690
 0 secs] [Times: user=0.18 sys=0.00, real=0.02 secs]
 Total time for which application threads were stopped: 806.7442720 seconds
 Total time for which application threads were stopped: 0.0006590 seconds
 Total time for which application threads were stopped: 0.0004360 seconds
 Total time for which application threads were stopped: 0.0004630 seconds
 Total time for which application threads were stopped: 0.0008120 seconds
 2011-03-17T13:37:59.018+0300: 34378.213: [GC 34378.213: [ParNew:
 676678K-21640K(737280K), 0.0137740 secs] 1590410K-949991K(4112384K),
 0.013961
 0 secs] [Times: user=0.13 sys=0.02, real=0.01 secs]
 Total time for which application threads were stopped: 0.0145920 seconds
 Total time for which application threads were stopped: 0.1036080 seconds
 Total time for which application threads were stopped: 0.0585600 seconds
 Total time for which application threads were stopped: 0.0600550 seconds
 Total time for which application threads were stopped: 0.0008560 seconds
 Total time for which application threads were stopped: 0.0006770 seconds
 Total time for which application threads were stopped: 0.0005910 seconds
 Total time for which application threads were stopped: 0.0351330 seconds
 Total time for which application threads were stopped: 0.0329020 seconds
 Total time for which application threads were stopped: 0.0728490 seconds
 Total time for which application threads were stopped: 0.0480990 seconds
 Total time for which application threads were stopped: 0.0804250 seconds
 2011-03-17T13:38:04.394+0300: 34383.589: [GC 34383.589: [ParNew:
 677000K-8375K(737280K), 0.0218310 secs] 1605351K-944271K(4112384K),
 0.0220300
  secs]




 I have follow nodetoll cfstats on hung node:

 Keyspace: fishdom_tuenti
 Read Count: 4970999
 Read Latency: 1.0267005945887335 ms.
 Write Count: 1441619
 Write Latency: 0.013146585887117193 ms.
 Pending Tasks: 0
 Column Family: decor
 SSTable count: 3
 Space used (live): 1296203532
 Space used (total): 1302520037
 Memtable Columns Count: 1066
 Memtable Data Size: 121742
 Memtable Switch Count: 11
 Read Count: 108125
 Read Latency: 2.809 ms.
 Write Count: 11261
 Write Latency: 0.006 ms.
 Pending Tasks: 0
 Key cache capacity: 30
 Key cache size: 46470
 Key cache 

Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
Depending on your memtable thresholds the heap may be too small for the
deployment. At the same time I don't see any other log statements around
that long pause that you have shown in the log snippet. It looks little odd
to me. All the ParNew collected almost same amount of heap and did not take
lot of time.

Check if it is due to some JVM bug.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6477891

-Naren

On Thu, Mar 17, 2011 at 9:47 AM, ruslan usifov ruslan.usi...@gmail.comwrote:



 2011/3/17 Narendra Sharma narendra.sha...@gmail.com

 What heap size are you running with? and Which version of Cassandra?

 4G with cassandra 0.7.4



Re: Cassandra c++ client

2011-03-16 Thread Narendra Sharma
libcassandra isn't vary active. Since we already has a object pool library,
we went for using raw thrift in C++ instead of using any other library.

Thanks,
Naren

On Wed, Mar 16, 2011 at 10:03 PM, Primal Wijesekera 
primalwijesek...@yahoo.com wrote:

 You could try this,

 https://github.com/posulliv/libcassandra

 - primal

 --
 *From:* Anurag Gujral anurag.guj...@gmail.com
 *To:* user@cassandra.apache.org
 *Sent:* Wed, March 16, 2011 9:36:25 PM
 *Subject:* Cassandra c++ client

 Hi All,
Anyone knows about stable C++ client for cassandra?
 Thanks
 Anurag




Re: Calculate memory used for keycache

2011-03-14 Thread Narendra Sharma
Sometime back I looked at the code to find that out. Following is the
result. There will be some additional overhead for internal DS for
ConcurrentLinkedHashMap.

Keycache size * (8 bytes for position i.e. value + X bytes for key +
16 bytes for token (RP) + 8 byte reference for DecoratedKey + 8 bytes
for descriptor reference)

Thanks,
Naren

On Mon, Mar 14, 2011 at 1:29 PM, ruslan usifov ruslan.usi...@gmail.comwrote:

 Hello


 How is it possible calculate this value? I think that key size, if we use
 RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache
 elements) bytes ??



Re: calculating initial_token

2011-03-14 Thread Narendra Sharma
On the same page there is a section on Load Balance that talks about python
script to compute tokens. I believe your question is more about assigning
new tokens and not compute tokens.

1. nodetool loadbalance will result in recomputation of tokens. It will
pick tokens based on the load and not the once assigned by you.
2. You can either use decommission and bootstrap with new tokens OR Use
nodetool move

Thanks,
Naren

On Mon, Mar 14, 2011 at 1:18 PM, Sasha Dolgy sdo...@gmail.com wrote:

 Sorry for being a bit daft ... Wanted a bit of validation or rejection ...

 If I have a 6 node cluster, replication factor 2 (don't think this is
 applicable to the token decision) is the following sufficient and
 correct for determining the tokens:

 #!/bin/bash
 for nodes in {0..5};
 do
echo $nodes*(2^127/5) | bc;
 done




 Gives me a result of:

 0
 34028236692093846346337460743176821145
 68056473384187692692674921486353642290
 102084710076281539039012382229530463435
 136112946768375385385349842972707284580
 170141183460469231731687303715884105725

 My ring right now is:


 10.0.0.2  Up Normal  225 KB  40.78%
 24053088190195663439419935163232881936
 10.0.0.3Up Normal  201.21 KB   19.17%
 56667357399723182105247119364967854254
 10.0.0.4   Up Normal  213.15 KB   17.61%
 86624712919272143003828971968762407027
 10.0.0.5   Up Normal  214.54 KB   11.22%
 105714724128406151241468359303513100912
 10.0.0.6  Up Normal  206.39 KB   5.61%
 115259729732973155360288052970888447854
 10.0.0.7Up Normal  247.68 KB   5.61%
 124804735337540159479107746638263794797

 If my new tokens are correct:

 1.  cassandra.yaml is updated on each node with new token
 2.  node is restarted and a nodetool repair is run, or is a nodetool
 loadbalance run

 Thanks in advance ... been staring at
 http://wiki.apache.org/cassandra/Operations#Token_selection for too
 long

 --
 Sasha Dolgy
 sasha.do...@gmail.com



Re: calculating initial_token

2011-03-14 Thread Narendra Sharma
The %age (owns) is just the arc length in terms of %age of tokens a node
owns out of the total token space. It doesn't reflect the actual data.

The size (load) is the real current load.

-Naren


On Mon, Mar 14, 2011 at 2:59 PM, Sasha Dolgy sdo...@gmail.com wrote:

 ah, you know ... i have been reading it wrong.  the output shows a
 nice fancy column called Owns but i've only ever seen the percentage
 ... the amount of data or load is even ... doh.  thanks for the
 reply.  cheers
 -sd

 On Mon, Mar 14, 2011 at 10:47 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  On the same page there is a section on Load Balance that talks about
 python
  script to compute tokens. I believe your question is more about assigning
  new tokens and not compute tokens.
 
  1. nodetool loadbalance will result in recomputation of tokens. It will
  pick tokens based on the load and not the once assigned by you.
  2. You can either use decommission and bootstrap with new tokens OR Use
  nodetool move
 
  Thanks,
  Naren
 
  On Mon, Mar 14, 2011 at 1:18 PM, Sasha Dolgy sdo...@gmail.com wrote:
 
  Sorry for being a bit daft ... Wanted a bit of validation or rejection
 ...
 
  If I have a 6 node cluster, replication factor 2 (don't think this is
  applicable to the token decision) is the following sufficient and
  correct for determining the tokens:
 
  #!/bin/bash
  for nodes in {0..5};
  do
 echo $nodes*(2^127/5) | bc;
  done
 
 
 
 
  Gives me a result of:
 
  0
  34028236692093846346337460743176821145
  68056473384187692692674921486353642290
  102084710076281539039012382229530463435
  136112946768375385385349842972707284580
  170141183460469231731687303715884105725
 
  My ring right now is:
 
 
  10.0.0.2  Up Normal  225 KB  40.78%
  24053088190195663439419935163232881936
  10.0.0.3Up Normal  201.21 KB   19.17%
  56667357399723182105247119364967854254
  10.0.0.4   Up Normal  213.15 KB   17.61%
  86624712919272143003828971968762407027
  10.0.0.5   Up Normal  214.54 KB   11.22%
  105714724128406151241468359303513100912
  10.0.0.6  Up Normal  206.39 KB   5.61%
  115259729732973155360288052970888447854
  10.0.0.7Up Normal  247.68 KB   5.61%
  124804735337540159479107746638263794797
 
  If my new tokens are correct:
 
  1.  cassandra.yaml is updated on each node with new token
  2.  node is restarted and a nodetool repair is run, or is a nodetool
  loadbalance run
 
  Thanks in advance ... been staring at
  http://wiki.apache.org/cassandra/Operations#Token_selection for too
  long
 
  --
  Sasha Dolgy
  sasha.do...@gmail.com
 
 



 --
 Sasha Dolgy
 sasha.do...@gmail.com



Re: Does the memtable replace the old version of column with the new overwriting version or is it just a simple append ?

2011-03-08 Thread Narendra Sharma
Multiple write for same key and column will result in overwriting of column
in a memtable. Basically multiple updates for same (key, column) are
reconciled based on the column's timestamp. This happens per memtable. So if
a memtable is flushed to an sstable, this rule will be valid for the next
memtable.
Note that sstables are immutable. So, different sstables may have different
versions of same (key, column), and the reconciliation of that happens
during read (read repair). This is why reads are slower than writes because
conflict resolution happens during read.

Hope this answers the question!

Thanks,
-Naren

On Tue, Mar 8, 2011 at 10:44 PM, Aditya Narayan ady...@gmail.com wrote:

 Do the overwrites of newly written columns(that are present in
 memtable) *replace the old column* or is it just a simple append.

 I am trying to understand that if I update these column very very
 frequently(while they are in memtable), does the read performance of
 these columns gets affected, since Cassandra will have to read so many
 versions of the same column. If this is just replacement with old
 column then I guess read will be much better since it needs to see
 just single existing version of column.

 Thanks
 Aditya Narayan



Re: OOM exceptions

2011-03-04 Thread Narendra Sharma
I have been through tuning for GC and OOM recently. If you can provide the
cassandra.yaml, I can help. Mostly I had to play with memtable thresholds.

Thanks,
Naren

On Fri, Mar 4, 2011 at 12:43 PM, Mark static.void@gmail.com wrote:

 We have 7 column families and we are not using the default key cache
 (20).

 These were our initial settings so it was not in response to anything.
 Would you recommend anything else? Thanks



 On 3/4/11 12:34 PM, Chris Burroughs wrote:

 - Are you using a key cache?  How many keys do you have?  Across how
 many column families

 You configuration is unusual both in terms of not setting min heap ==
 max heap and the percentage of available RAM used for the heap.  Did you
 change the heap size in response to errors or for another reason?

 On 03/04/2011 03:25 PM, Mark wrote:

 This happens during compaction and we are not using the RowsCached
 attribute.

 Our initial/max heap are 2 and 6 respectively and we have 8 gigs in
 these machines.

 Thanks

 On 3/4/11 12:05 PM, Chris Burroughs wrote:

 - Does this occur only during compaction or at seemingly random times?
 - How large is your heap?  What jvm settings are you using? How much
 physical RAM do you have?
 - Do you have the row and/or key cache enabled?  How are they
 configured?  How large are they when the OOM is thrown?

 On 03/04/2011 02:38 PM, Mark Miller wrote:

 Other than adding more memory to the machine is there a way to solve
 this? Please help. Thanks

 ERROR [COMPACTION-POOL:1] 2011-03-04 11:11:44,891 CassandraDaemon.java
 (line org.apache.cassandra.thrift.CassandraDaemon$1) Uncaught exception
 in thread Thread[COMPACTION-POOL:1,5,main]
 java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:2798)
  at
 java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:111)
  at java.io.DataOutputStream.write(DataOutputStream.java:107)
  at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
  at

 org.apache.cassandra.utils.FBUtilities.writeByteArray(FBUtilities.java:298)

  at

 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:66)


  at

 org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:311)


  at

 org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:284)


  at

 org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87)


  at

 org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:99)


  at

 org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:140)


  at

 org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)


  at

 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)


  at

 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)


  at

 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)


  at

 org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)


  at

 org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)


  at

 org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:294)


  at

 org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:101)


  at

 org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:82)

  at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)


  at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)


  at java.lang.Thread.run(Thread.java:636)




Cassandra 0.7.2 - Enable/Disable HH via JMX (Jconsole)

2011-03-03 Thread Narendra Sharma
I am unable to enable/disable HH via JMX (JConsole).

Even though the load is on and read/writes happening, I don't see
operations component on Jconsole. To clarify further, I see only
Jconsole-MBeans-org.apache.cassandra.db.StorageProxy.Attributes. I don't
see Jconsole-MBeans-org.apache.cassandra.db.StorageProxy.Operations. As a
result I cannot operation like enable/disable HH.

Is this is a bug or I am missing something?

Thanks,

Naren


Re: New thread for : How does Cassandra handle failure during synchronous writes

2011-02-24 Thread Narendra Sharma
You are missing the point. The coordinator node that is handling the request
won't wait for all the nodes to return their copy/digest of data. It just
wait for Q (RF/2+1) nodes to return. This is the reason I explained two
possible scenarios.

Further, on what basis Cassandra will know that the data on N1 is result of
a failure? Think about it!!

Also, take a look at http://wiki.apache.org/cassandra/API. Following is from
Cassandra wiki:
Because the repair replication process only requires a write to reach a
single node to propagate, a write which 'fails' to meet consistency
requirements will still appear eventually so long at it was written to at
least one node. With W and R both using QUORUM, the best consistency we can
achieve is the guarantee that we will receive the same value regardless of
which nodes we read from. However, we can still peform a W=QUORUM that
fails but reaches one server, perform a R=QUORUM that reads the old value,
and then sometime later perform a R=QUORUM that reads the new value.

Hope this make things very clear!



On Thu, Feb 24, 2011 at 4:47 AM, Anthony John chirayit...@gmail.com wrote:

 c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
 that was written to node1 will be returned.

 In this case - N1 will be identified as a discrepancy and the change will
 be discarded via read repair

 [Naren] How will Cassandra know this is a discrepancy?

 Because at Q - only N1 will have the new data and other other nodes
 won't. This lack of consistency on N! will be detected and repaired. The
 value that meets Q - the values from N2-3 - will be returned.

 HTH



Re: dropped mutations, UnavailableException, and long GC

2011-02-24 Thread Narendra Sharma
1. Why 24GB of heap? Do you need this high heap? Bigger heap can lead to
longer GC cycles but 15min look too long.
2. Do you have ROW cache enabled?
3. How many column families do you have?
4. Enable GC logs and monitor what GC is doing to get idea of why it is
taking so long. You can add following to enable gc log.
# GC logging options -- uncomment to enable
# JVM_OPTS=$JVM_OPTS -XX:+PrintGCDetails
# JVM_OPTS=$JVM_OPTS -XX:+PrintGCTimeStamps
# JVM_OPTS=$JVM_OPTS -XX:+PrintClassHistogram
# JVM_OPTS=$JVM_OPTS -XX:+PrintTenuringDistribution
# JVM_OPTS=$JVM_OPTS -XX:+PrintGCApplicationStoppedTime
# JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log

5. Move to Cassandra 0.7.2, if possible. It has following nice feature:
added flush_largest_memtables_at and reduce_cache_sizes_at options to
cassandra.yaml as an escape value for memory pressure

Thanks,
Naren


On Thu, Feb 24, 2011 at 2:21 PM, Jeffrey Wang jw...@palantir.com wrote:

 Hey all,



 Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB
 disk each collocated in a DC. We’re doing bulk imports from each of the
 nodes with RF = 2 and write consistency ANY (write perf is very important).
 The behavior we’re seeing is this:



 -  Nodes often see each other as dead even though none of the
 nodes actually go down. I suspect this may be due to long GCs. It seems like
 increasing the RPC timeout could help this, but I’m not convinced this is
 the root of the problem. Note that in this case writes return with the
 UnavailableException.

 -  As mentioned, long GCs. We see the ParNew GC doing a lot of
 smaller collections (few hundred MB) which are very fast (few hundred ms),
 but every once in a while the ConcurrentMarkSweep will take a LONG time (up
 to 15 min!) to collect upwards of 15GB at once.

 -  On some nodes, we see a lot of pending MutationStages build up
 (e.g. 500K), which leads to the messages “Dropped X MUTATION messages in the
 last 5000ms,” presumably meaning that Cassandra has decided to not write one
 of the replicas of the data. This is not a HUGE deal, but is less than
 ideal.

 -  The end result is that a bunch of writes end up failing due to
 the UnavailableExceptions, so not all of our data is getting into Cassandra.



 So my question is: what is the best way to avoid this behavior? Our
 memtable thresholds are fairly low (256MB) so there should be plenty of heap
 space to work with. We may experiment with write consistency ONE or ALL to
 see if the perf hit is not too bad, but I wanted to get some opinions on why
 this might be happening. Thanks!



 -Jeffrey





Changing comparators

2011-02-23 Thread Narendra Sharma
Today it is not possible to change the comparators (compare_with and
compare_subcolumns_with). I went through the discussion on thread
http://comments.gmane.org/gmane.comp.db.cassandra.user/12466.

Does it make sense to atleast allow one way change i.e. from specific types
to generic type? For eg change from TimeUUIDType or UTF8 to BytesType. This
could be a manual process where users will do the schema change and then run
major compaction on all the nodes to fix the ordering.

Thanks,
Naren


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
Remember the simple rule. Column with highest timestamp is the one that will
be considered correct EVENTUALLY. So consider following case:

Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
QUORUM
a. QUORUM in this case requires 2 nodes. Write failed with successful write
to only 1 node say node1.
b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
returned with read repair triggered in background. On next read you will get
the data that was written to node1.
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that
was written to node1 will be returned.

HTH!

Thanks,
Naren


On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala 
tijoriwala.rit...@gmail.com wrote:

 Hi Anthony,
 I am not talking about the case of CL ANY. I am talking about the case
 where your consistency level is  R + W  N and you want to write to W nodes
 but only succeed in writing to X ( where X  W) nodes and hence fail the
 write to the client.

 thanks,
 Ritesh

 On Wed, Feb 23, 2011 at 2:48 PM, Anthony John chirayit...@gmail.comwrote:

 Ritesh,

 At CL ANY - if all endpoints are down - a HH is written. And it is a
 successful write - not a failed write.

 Now that does not guarantee a READ of the value just written - but that is
 a risk that you take when you use the ANY CL!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala 
 tijoriwala.rit...@gmail.com wrote:

 hi Anthony,
 While you stated the facts right, I don't see how it relates to the
 question I ask. Can you elaborate specifically what happens in the case I
 mentioned above to Dave?

 thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 1:57 PM, Anthony John chirayit...@gmail.comwrote:

 Seems to me that the explanations are getting incredibly complicated -
 while I submit the real issue is not!

 Salient points here:-
 1. To be guaranteed data consistency - the writes and reads have to be
 at Quorum CL or more
 2. Any W/R at lesser CL means that the application has to handle the
 inconsistency, or has to be tolerant of it
 3. Writing at ANY CL - a special case - means that writes will always
 go through (as long as any node is up), even if the destination nodes are
 not up. This is done via hinted handoff. But this can result in 
 inconsistent
 reads, and yes that is a problem but refer to pt-2 above
 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
 handle that case where a particular node is down and the write needs to be
 replicated to it. But this will not cause inconsistent R as the hinted
 handoff (in this case) only applies after Quorum is met - so a Quorum R is
 not dependent on the down node being up, and having got the hint.

 Hope I state this appropriately!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala 
 tijoriwala.rit...@gmail.com wrote:

  Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more replicas.

 Is the newest value the quorum value which means it is the old value
 that will be written back to the nodes having newer non-quorum value or
 the newest value is the real new value? :) If later, than this seems kind 
 of
 odd to me and how it will be useful to any application. A bug?

 Thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell d...@meebo-inc.comwrote:

 Ritesh,

 You have seen the problem. Clients may read the newly written value
 even though the client performing the write saw it as a failure. When the
 client reads, it will use the correct number of replicas for the chosen 
 CL,
 then return the newest value seen at any replica. This newest value 
 could
 be the result of a failed write.

 Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more 
 replicas.

 R+WN guarantees serial order of operations: any read at CL=R that
 occurs after a write at CL=W will observe the write. I don't think this
 property is relevant to your current question, though.

 Cassandra has no mechanism to roll back the partial write, other
 than to simply write again. This may also fail.

 Best,
 Dave


 On Wed, Feb 23, 2011 at 10:12 AM, tijoriwala.rit...@gmail.comwrote:

 Hi Dave,
 Thanks for your input. In the steps you mention, what happens when
 client tries to read the value at step 6? Is it possible that the 
 client may
 see the new value? My understanding was if R + W  N, then client will 
 not
 see the new value as Quorum nodes will not agree on the new value. If 
 that
 is the case, then its alright to return failure to the client. However, 
 if
 not, then it is difficult to program as after every failure, you as an
 client are not sure if failure is a pseudo failure with some side 
 effects or
 real failure.

 Thanks,
 Ritesh

 quote author='Dave Revell'

 Ritesh,

 There is no commit protocol. Writes may be persisted on some replicas
 even
 though the quorum fails. Here's a sequence of 

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
that was written to node1 will be returned.

In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair

[Naren] How will Cassandra know this is a discrepancy?

On Wed, Feb 23, 2011 at 6:05 PM, Anthony John chirayit...@gmail.com wrote:

 Remember the simple rule. Column with highest timestamp is the one that
 will be considered correct EVENTUALLY. So consider following case:

 I am sorry, that will return inconsistent results even a Q. Time stamp have
 nothing to do with this. It is just an application provided artifact and
 could be anything.

 c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
 that was written to node1 will be returned.

 In this case - N1 will be identified as a discrepancy and the change will
 be discarded via read repair

 On Wed, Feb 23, 2011 at 6:47 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Remember the simple rule. Column with highest timestamp is the one that
 will be considered correct EVENTUALLY. So consider following case:

 Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
 QUORUM
 a. QUORUM in this case requires 2 nodes. Write failed with successful
 write to only 1 node say node1.
 b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
 returned with read repair triggered in background. On next read you will get
 the data that was written to node1.
 c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
 that was written to node1 will be returned.

 HTH!

 Thanks,
 Naren



 On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala 
 tijoriwala.rit...@gmail.com wrote:

 Hi Anthony,
 I am not talking about the case of CL ANY. I am talking about the case
 where your consistency level is  R + W  N and you want to write to W nodes
 but only succeed in writing to X ( where X  W) nodes and hence fail the
 write to the client.

 thanks,
 Ritesh

 On Wed, Feb 23, 2011 at 2:48 PM, Anthony John chirayit...@gmail.comwrote:

 Ritesh,

 At CL ANY - if all endpoints are down - a HH is written. And it is a
 successful write - not a failed write.

 Now that does not guarantee a READ of the value just written - but that
 is a risk that you take when you use the ANY CL!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala 
 tijoriwala.rit...@gmail.com wrote:

 hi Anthony,
 While you stated the facts right, I don't see how it relates to the
 question I ask. Can you elaborate specifically what happens in the case I
 mentioned above to Dave?

 thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 1:57 PM, Anthony John 
 chirayit...@gmail.comwrote:

 Seems to me that the explanations are getting incredibly complicated -
 while I submit the real issue is not!

 Salient points here:-
 1. To be guaranteed data consistency - the writes and reads have to be
 at Quorum CL or more
 2. Any W/R at lesser CL means that the application has to handle the
 inconsistency, or has to be tolerant of it
 3. Writing at ANY CL - a special case - means that writes will
 always go through (as long as any node is up), even if the destination 
 nodes
 are not up. This is done via hinted handoff. But this can result in
 inconsistent reads, and yes that is a problem but refer to pt-2 above
 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
 handle that case where a particular node is down and the write needs to 
 be
 replicated to it. But this will not cause inconsistent R as the hinted
 handoff (in this case) only applies after Quorum is met - so a Quorum R 
 is
 not dependent on the down node being up, and having got the hint.

 Hope I state this appropriately!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala 
 tijoriwala.rit...@gmail.com wrote:

  Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more 
 replicas.

 Is the newest value the quorum value which means it is the old
 value that will be written back to the nodes having newer non-quorum 
 value
 or the newest value is the real new value? :) If later, than this seems 
 kind
 of odd to me and how it will be useful to any application. A bug?

 Thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell d...@meebo-inc.comwrote:

 Ritesh,

 You have seen the problem. Clients may read the newly written value
 even though the client performing the write saw it as a failure. When 
 the
 client reads, it will use the correct number of replicas for the 
 chosen CL,
 then return the newest value seen at any replica. This newest value 
 could
 be the result of a failed write.

 Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more 
 replicas.

 R+WN guarantees serial order of operations: any read at CL=R that
 occurs after a write at CL=W will observe the write. I don't think

Does HH work (or make sense) for counters?

2011-02-01 Thread Narendra Sharma
Version: Cassandra 0.7.1 (build from trunk)

Setup:
- Cluster of 2 nodes (Say A and B)
- HH enabled
- Using the default Keyspace definition in cassandra.yaml
- Using SuperCounter1 CF

Client:
- Using CL of ONE

I started the two Cassandra nodes, created schema and then shutdown one of
the instances (say B). Executed counter update and read operations on A with
CL=ONE. Everything worked fine. All counters were returned with correct
values. Now started node B, waited for couple of mins. Executed only counter
read operation on B with CL=ONE. Initially got no counters for any of the
rows. On second (and subsequent tries) try got counters for only one (same
row always) out of ten rows.

After doing one read with CL=QUORUM, reads with CL=ONE started returning
correct data.

Thanks,
Naren


sstable2json for SuperCounter CF not working

2011-02-01 Thread Narendra Sharma
Version: Cassandra 0.7.1 (build from trunk)

Setup:
- Cluster of 2 nodes (Say A and B)
- HH enabled
- Using the default Keyspace definition in cassandra.yaml
- Using SuperCounter1 CF

Steps:
- Started the two nodes, loaded schema using nodetool
- Executed counter update and read operations on A with CL=ONE. Everything
worked fine. All counters were returned with correct values.
- Using nodetool flush, flushed the memtable to sstable
- Used sstable2json on the sstable and got following exception:

[root@msg-qelnx01-v14 bin]# ./sstable2json
../../cassandra071/data/Keyspace1/SuperCounter1-f-1-Data.db
 WARN 11:38:45,081 Schema definitions were defined both locally and in
cassandra.yaml. Definitions in cassandra.yaml were ignored.
{
  62626232: { 787832: {deletedAt: -9223372036854775808, subColumns:
[[616464636f756e74, Exception in thread main
org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes
at
org.apache.cassandra.db.marshal.CounterColumnType.getString(CounterColumnType.java:57)
at
org.apache.cassandra.tools.SSTableExport.serializeColumns(SSTableExport.java:100)
at
org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:153)
at
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:296)
at
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:330)
at
org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:343)
at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:400)

Thanks,
Naren


Question on max_hint_window_in_ms

2011-01-31 Thread Narendra Sharma
As per config:
# this defines the maximum amount of time a dead host will have hints
# generated.  After it has been dead this long, hints will be dropped.
max_hint_window_in_ms: 360 # one hour

Will this result in deletion of existing hints (from mem and disk)? or it
will just stop creating new hints?

Thanks,
Naren


EOFException in ReadStage

2011-01-30 Thread Narendra Sharma
Version: Cassandra 0.7.1

I am seeing following exception at regular interval (very frequently) in
Cassandra. I did a clean install of Cassandra 0.7.1 and deleted all old
data. Any idea what could be the cause? The stack is same for all the
occurrances.

Thanks,
Naren

ERROR [ReadStage:11232] 2011-01-28 20:19:09,671
AbstractCassandraDaemon.java (line 114) Fatal exception in thread
Thread[ReadStage:11232,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:75)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1267)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1159)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1088)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at 
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)


at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)


Re: Using Cassandra for storing large objects

2011-01-27 Thread Narendra Sharma
Thanks Anand. Few questions:
- What is the size of nodes (in terms for data)?
- How long have you been running?
- Howz compaction treating you?

Thanks,
Naren

On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.com wrote:

 Using it for storing large immutable objects, like Aaron was suggesting we
 are splitting the blob across multiple columns. Also we are reading it a few
 columns at a time (for memory considerations). Currently we have only gone
 upto about 300-400KB size objects.

 We do have machines with 32Gb memory and with 8G for java. Row cache is
 disabled. There is some latency that needs to be sorted out, but overall I
 am positive. This is with 6.6, am in the process of moving it to 0.7.

 On Wed, Jan 26, 2011 at 11:37 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Anyone using Cassandra for storing large number (millions) of large
 (mostly immutable) objects (200KB-5MB size each)? I would like to understand
 the experience in general considering that Cassandra is not considered a
 good fit for large objects.
 https://issues.apache.org/jira/browse/CASSANDRA-265


 Thanks,
 Naren





Re: Using Cassandra for storing large objects

2011-01-27 Thread Narendra Sharma
Thanks Anand. Let's keep exchanging our experiences.

-Naren

On Thu, Jan 27, 2011 at 8:50 PM, Anand Somani meatfor...@gmail.com wrote:

 At this point we are not in production, in the lab only. The longest test
 so far has been about 2-3 days, the datasize at this point is about 2-3 TB
 per node, we have 2 nodes. We do see spikes to high response times (and
 timeouts), which seemed to be around the time GC kicks in. We were pushing
 the system as much as we can. Also given our application we can do major
 compactions at night, have not tried it on this big data set yet. We do
 still have minor compactions turned on.


 On Thu, Jan 27, 2011 at 12:56 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Thanks Anand. Few questions:
 - What is the size of nodes (in terms for data)?
 - How long have you been running?
 - Howz compaction treating you?

 Thanks,
 Naren


 On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.comwrote:

 Using it for storing large immutable objects, like Aaron was suggesting
 we are splitting the blob across multiple columns. Also we are reading it a
 few columns at a time (for memory considerations). Currently we have only
 gone upto about 300-400KB size objects.

 We do have machines with 32Gb memory and with 8G for java. Row cache is
 disabled. There is some latency that needs to be sorted out, but overall I
 am positive. This is with 6.6, am in the process of moving it to 0.7.

 On Wed, Jan 26, 2011 at 11:37 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Anyone using Cassandra for storing large number (millions) of large
 (mostly immutable) objects (200KB-5MB size each)? I would like to 
 understand
 the experience in general considering that Cassandra is not considered a
 good fit for large objects.
 https://issues.apache.org/jira/browse/CASSANDRA-265


 Thanks,
 Naren







Using Cassandra for storing large objects

2011-01-26 Thread Narendra Sharma
Anyone using Cassandra for storing large number (millions) of large (mostly
immutable) objects (200KB-5MB size each)? I would like to understand the
experience in general considering that Cassandra is not considered a good
fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265


Thanks,
Naren


Re: get_range_slices getting deleted rows

2011-01-25 Thread Narendra Sharma
Yes. See this http://wiki.apache.org/cassandra/FAQ#range_ghosts

-Naren

On Tue, Jan 25, 2011 at 2:59 PM, Nick Santini nick.sant...@kaseya.comwrote:

 Hi,
 I'm trying a test scenario where I create 100 rows in a CF, then
 use get_range_slices to get all the rows, and I get 100 rows, so far so good
 then after the test I delete the rows using remove but without a column
 or super column, this deletes the row, I can confirm that cos if I try to
 get it with get_slice using the key I get nothing

 but then if I do get_range_slice again, where the range goes between new
 byte[0] and new byte[0] (therefore returning everything), I still get the
 100 row keys

 is that expected to be?

 thanks

 Nicolas Santini



Re: cassandra 0.7.0 noob question

2011-01-06 Thread Narendra Sharma
The schema is not loaded from cassandra.yaml by default. You need to either
load it through jconsole or define it through CLI. Please read following
page for details:
http://wiki.apache.org/cassandra/LiveSchemaUpdates

Also look for Where are my keyspaces on following page:
http://wiki.apache.org/cassandra/StorageConfiguration

Thanks,
Naren

On Thu, Jan 6, 2011 at 2:00 PM, felix gao gre1...@gmail.com wrote:

 Hi all,

 I started cassandra with very thing untouched in the conf folder, when I
 examine the cassandra.yaml file, there seems to be a default keyspace
 defined like below.
 keyspaces:
 - name: Keyspace1
   replica_placement_strategy:
 org.apache.cassandra.locator.SimpleStrategy
   replication_factor: 1
   column_families:
 - name: Standard1

 my question is when I ran the cassandra-cli and show keyspaces; only system
 keyspace is there.  What is going on?

 Thanks,

 Felix




Re: quick question about super columns

2011-01-06 Thread Narendra Sharma
With raw thrift APIs:

1. Fetch column from supercolumn:

ColumnPath cp = new ColumnPath(ColumnFamily);
cp.setSuper_column(SuperColumnName);
cp.setColumn(ColumnName);
ColumnOrSuperColumn resp = client.get(getByteBuffer(RowKey), cp,
ConsistencyLevel.ONE);
Column c = resp.getColumn();

2. Add a new supercolumn:

SuperColumn superColumn = new SuperColumn();
superColumn.setName(getBytes(SuperColumnName));
cols = new ArrayListColumn();
Column c = new Column();
c.setName(name);
c.setValue(value);
c.setTimestamp(timeStamp);
cols.add(c);
//repeat above 5 lines for as many cols you want in supercolumn
superColumn.setColumns(cols);


ListMutation mutations = new ArrayListMutation();
ColumnOrSuperColumn csc = new ColumnOrSuperColumn();
csc.setSuper_column(superColumn);
csc.setSuper_columnIsSet(true);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(csc);
m.setColumn_or_supercolumnIsSet(true);
mutations.add(m);


MapString, ListMutation allMutations = new HashMapString,
ListMutation();
allMutations.put(ColumnFamilyName, mutations);
MapByteBuffer, MapString, ListMutation mutationMap = new
HashMapByteBuffer, MapString, ListMutation();
mutationMap.put(getByteBuffer(RowKey), mutations);
client.batch_mutate(mutationMap, ConsistencyLevel.ONE);

HTH!

Thanks,
Naren



On Thu, Jan 6, 2011 at 10:42 PM, Arijit Mukherjee ariji...@gmail.comwrote:

 Thank you. And is it similar if I want to search a subcolumn within a
 given supercolumn? I mean I have the supercolumn key and the subcolumn
 key - can I fetch the particular subcolumn?

 Can you share a small piece of example code for both?

 I'm still new into this and trying to figure out the Thrift APIs. I
 attempted to use Hector, but got myself into more confusion.

 Arijit

 On 7 January 2011 11:44, Roshan Dawrani roshandawr...@gmail.com wrote:
 
  On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee ariji...@gmail.com
 wrote:
 
  Hi
 
  I've a quick question about supercolumns.
  EventRecord = {
 eventKey2: {
 e2-ts1: {set of columns},
 e2-ts2: {set of columns},
 ...
 e2-tsn: {set of columns}
 }
 
  }
 
  If I want to append another e2-tsp: {set of columns} to the event
  record keyed by eventKey2, do I need to retrieve the entire eventKey2
  map, and then append this new row and re-insert eventKey2?
 
  No, you can simply insert a new super column with its sub-columns with
 the rowKey that you want, and it will join the other super columns of that
 row.
 
  A row have billions of super columns. Imagine fetching them all, just to
 add one more super column into it.
 
 
 
 
 


 --
 And when the night is cloudy,
 There is still a light that shines on me,
 Shine on until tomorrow, let it be.



Re: Any GUI for Cassandra database on Windows?

2010-12-29 Thread Narendra Sharma
cassandra-gui doesn't work with Cassandra 0.7. It could be due to thrift
version difference, api differences or default framed mode. Better to switch
to something that works for sure.

Thanks,
-Naren

On Mon, Dec 27, 2010 at 9:15 PM, Roshan Dawrani roshandawr...@gmail.comwrote:

 Sorry. Will do that.

 I am using Cassandra 0.7.0-rc2.

 I will try this DB client. Thanks.


 On Tue, Dec 28, 2010 at 10:41 AM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Please do mention the Cassandra version you are using in all ur queries.
 It helps.

 Try https://github.com/driftx/chiton

 Thanks,
 Naren


 On Mon, Dec 27, 2010 at 7:37 PM, Roshan Dawrani 
 roshandawr...@gmail.comwrote:

 Hi,

 Is there a GUI client for a Cassandra database for a Windows based setup?

 I tried the one available at http://code.google.com/p/cassandra-gui/,
 but it always fails to connect with error: Cannot read. Remote site has
 closed. Tried to read 4 bytes, but only got 0 bytes.

 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani






Re: I have TimeUUID sorted keys. Can I get the range query return rows in the same order as sorted keys?

2010-12-27 Thread Narendra Sharma
You will need to use OPP to perform range scans. Look for Range Queries on
http://wiki.apache.org/cassandra/DataModel

Look at this to understand why range queries are not supported for
RamdomPartitioner (https://issues.apache.org/jira/browse/CASSANDRA-1750)

Thanks,
Naren

On Mon, Dec 27, 2010 at 8:35 AM, Roshan Dawrani roshandawr...@gmail.comwrote:

 I had seen RangeSlicesQuery, but I didn't notice that I could also give a
 key range there.

 How does a KeyRange work? Doesn't it need some sort from the partitioner -
 whether that is order preserving or not?

 I couldn't be sure of a query that was based on order of the rows in the
 column family, so I didn't explore that much.



 On Mon, Dec 27, 2010 at 9:55 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Did you look at get_range_slices? Once you get the columns from super
 column, pick the first and last to form the range and fire the
 get_range_slice.

 Thanks,
 -Naren


 On Mon, Dec 27, 2010 at 6:12 AM, Roshan Dawrani 
 roshandawr...@gmail.comwrote:

 This silly question is retrieved back with apology. There couldn't be
 anything easier to handle at the application level.

 rgds,
 Roshan


 On Mon, Dec 27, 2010 at 9:04 AM, Roshan Dawrani roshandawr...@gmail.com
  wrote:

 Hi,
 I have the following 2 column families - one being used to store full
 rows for an entity and other is an index table for having the TimeUUID
 sorted row keys.

 I am able to query the TimeUUID columns under the super column fine. But
 now I need to go to main CF and get the data and I want the rows in the 
 same
 time order as the keys.

 I am using MultiGetSliceQuery to query the main entity data for the
 sorted keys, but the rows don't come back in the same order, which defeats
 the purpose of storing the time sorted subcolumns. I suppose for each key, 
 I
 can fire an individual SliceQuery, but that does not look efficient to me. 
 I
 do want to fire a range query.

 MainEntityCF {
  TimeUUIDKeyA: [Col1 : Val1, Col2 : Val2, Col3 :
 Val3]
  TimeUUIDKeyX: [Col1 : Val1, Col2 : Val2, Col3 :
 Val3]
  TimeUUIDKeyB: [Col1 : Val1, Col2 : Val2, Col3 :
 Val3]
  TimeUUIDKeyY: [Col1 : Val1, Col2 : Val2, Col3 :
 Val3]
 }
 MainEntityCF_Index {
   SomeSuperColumn: [TimeUUIDKeyA:null, TimeUUIDKeyB:null,
 TimeUUIDKeyX:null, TimeUUIDKeyY:null]
 }

 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani





 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani




Re: Supercolumn Maximums

2010-12-27 Thread Narendra Sharma
#1 - No limit
#2 - If you are referring to secondary indexes then NO. Also see
https://issues.apache.org/jira/browse/CASSANDRA-598
#3 - No limit

Following are key limitations:
1. All data for a single row must fit (on disk) on a single machine in the
cluster
2. A single column value may not be larger than 2GB.

See more on:
http://wiki.apache.org/cassandra/CassandraLimitations


-Naren

On Mon, Dec 27, 2010 at 9:01 PM, David G. Boney 
dbon...@semanticartifacts.com wrote:

 1. What are the maximum number of supercolumns that a row can have?
 2. Are supercolumns indexed?
 3. What are the maximum number of subcolumns in a supercolumn?
 -
 Sincerely,
 David G. Boney
 dbon...@semanticartifacts.com
 http://www.semanticartifacts.com







Re: Cassandra 0.7 - Impact of row size and columns on compaction

2010-12-05 Thread Narendra Sharma
This is very useful. Thanks Aaron!

-Naren

On Sun, Dec 5, 2010 at 12:35 PM, Aaron Morton aa...@thelastpickle.comwrote:

 AFAIK if the entire row can be read into memory the compaction will be
 faster. The in_memory_compaction_limit_in_mb setting is used to decide how
 big the row can be before it has to use a slower two pass process.

 Also my understanding is that one of the main factors for compaction is the
 number of over-writes for rows / columns. e.g if the data for a row is
 spread over a lot of ss tables (for new columns and/or updates and/or
 deletes) it will take longer to compact that row.

 Hope that helps.
 Aaron


 On 04 Dec, 2010,at 09:23 AM, Narendra Sharma narendra.sha...@gmail.com
 wrote:

 What is the impact (performance and I/O) of row size (in bytes) on
 compaction?
 What is the impact (performance and I/O) of number of super columns and
 columns on compaction?

 Does anyone has any details and data to share?

 Thanks,
 Naren




Cassandra 0.7 - Impact of row size and columns on compaction

2010-12-03 Thread Narendra Sharma
What is the impact (performance and I/O) of row size (in bytes) on
compaction?
What is the impact (performance and I/O) of number of super columns and
columns on compaction?

Does anyone has any details and data to share?

Thanks,
Naren


Fetch a SuperColumn based on value of column

2010-12-02 Thread Narendra Sharma
Hi,

My schema has a row that has thousands of Super Columns. The size of each
super column is around 500B (20 columns). I need to query 1 SuperColumn
based on value of one of its column. Something like

SELECT SuperColumn FROM Row WHERE SuperColumn.column=value

Questions:
1. Is this possible with current Cassandra APIs? If yes, could you please
show with a sample.
2. How would such a query perform if the number of SuperColumns is high (
10K)?

Cassandra version 0.7.

Thanks,
Naren


Re: Fetch a SuperColumn based on value of column

2010-12-02 Thread Narendra Sharma
Thanks Aaron!

The first request requires you to know the SuperColumn name. In my case I
don't know the SuperColumn name cause if I know then I can read the super
column. I need to find the SuperColumn that has column with given value for
a given column.
The usecase is that application allows querying object by two attributes. I
have made one of the attribute as Supercolumn name. I need to keep the
second attribute as subcolumn in super column. Now I need to perform search
by subcolumn.
I think the only option is to maintain another CF with column name as the
second attribute with value as the name of super column in current CF. Is
there any better way to handle this?

Thanks,
Naren

On Thu, Dec 2, 2010 at 5:48 PM, Aaron Morton aa...@thelastpickle.comwrote:

 You can use column and super column names with the get_slice() function
 without 0.7 secondary indexes. I'm assuming that the original query was to
 test for the existence of a column by name.

 In the case below, to retrieve the full super column would require to
 request...

 First to test the condition. get_slice with a ColumnParent that specifies
 the CF and the Super Column and a slice predicate with the column_names[]
 containing the name of the col you want. This query would only return the
 one column.

 If you then wanted to get all columns in the super column you would make
 another request.

 If making two requests is a pain or too slow, consider changing the data
 model to better support the requests you need to make.

 AFAIK a lot of super columns will not impact performance any more than a
 lot of column. There are however limitations to the number of columns in a
 super column http://wiki.apache.org/cassandra/CassandraLimitations
 http://wiki.apache.org/cassandra/CassandraLimitations
 Hope that helps.
 Aaron


 On 03 Dec, 2010,at 01:10 PM, Nick Santini nick.sant...@kaseya.com wrote:

 actually, the solution would be something like my last mail, but pointing
 to the name of the super column and the row key


 Nicolas Santini
 Director of Cloud Computing
 Auckland - New Zealand
 (64) 09 914 9426 ext 2629
 (64) 021 201 3672



 On Fri, Dec 3, 2010 at 1:08 PM, Nick Santini nick.sant...@kaseya.comwrote:

 Hi,
 as I got answered on my mail, secondary indexes for super column families
 is not supported yet, so you have to implement your own

 easy way: keep another column family where the row key is the value of
 your field and the columns are the row keys of your super column family

 (inverted index)


 Nicolas Santini
 Director of Cloud Computing
 Auckland - New Zealand
 (64) 09 914 9426 ext 2629
 (64) 021 201 3672




 On Fri, Dec 3, 2010 at 1:00 PM, Narendra Sharma 
 narendra.sha...@gmail.com wrote:

 Hi,

 My schema has a row that has thousands of Super Columns. The size of each
 super column is around 500B (20 columns). I need to query 1 SuperColumn
 based on value of one of its column. Something like

 SELECT SuperColumn FROM Row WHERE SuperColumn.column=value

 Questions:
 1. Is this possible with current Cassandra APIs? If yes, could you please
 show with a sample.
 2. How would such a query perform if the number of SuperColumns is high
 ( 10K)?

 Cassandra version 0.7.

 Thanks,
 Naren






C++ client for Cassandra

2010-11-30 Thread Narendra Sharma
Are there any C++ clients out there similar to Hector (in terms of features)
for Cassandra? I am looking for C++ Client for Cassandra 0.7.

Thanks,
Naren


batch_mutate vs number of write operations on CF

2010-11-29 Thread Narendra Sharma
Hi,

I am using Cassandra 0.7 beta3 and Hector.

I create a mutation map. The mutation involves adding few columns for a
given row. After that I use batch_mutate API to send the changes to
Cassandra.

Question:
If there are multiple column writes on same row in a mutation_map, does
Cassandra show (on JMX write count stats for CF) that as 1 write operation
or as N write operations where N is the number of entries in mutation map
for that row.
Assume all the changes in mutation map are for one row.

Thanks,
Naren


Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
Is there any documentation available on what is possible with secondary
indexes? For eg
- Is it possible to define secondary index on columns within a SuperColumn?
- If I define a secondary index at run time, does Cassandra index all the
existing data or only new data is indexed?

Some documentation along with examples will be highly useful.

Thanks,
Naren


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
Thanks Jonathan.

Couple of more questions:
1. Is there any technical limit on the number of secondary indexes that can
be created?

2. Is it possible to execute join queries spanning multiple secondary
indexes?

Thanks,
Naren

On Mon, Nov 29, 2010 at 6:02 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Nov 29, 2010 at 7:59 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  Is there any documentation available on what is possible with secondary
  indexes?

 Not yet.

  - Is it possible to define secondary index on columns within a
 SuperColumn?

 No.

  - If I define a secondary index at run time, does Cassandra index all the
  existing data or only new data is indexed?

 The former.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
On Mon, Nov 29, 2010 at 9:32 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Nov 29, 2010 at 11:26 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  Thanks Jonathan.
 
  Couple of more questions:
  1. Is there any technical limit on the number of secondary indexes that
 can
  be created?

 Just as with traditional databases, the more indexes there are the
 slower writes to that CF will be.

  2. Is it possible to execute join queries spanning multiple secondary
  indexes?

 What do secondary indexes have to do with joins?


For eg if I want to get all employees that are male and have age = 35 years.
How can secondary indexes be useful in such scenario?


 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



working of get_range_slices

2010-10-14 Thread Narendra Sharma
Hi,

I am using Cassandra 0.6.5. Our application uses the get_range_slices to get
rows in the given range.

Could someone please explain how get_range_slices works internally esp when
a count parameter (value = 1) is also specified in the SlicePredicate? Does
Cassandra first search all in the given range and then return top 1 or it
some how reads only 1 and return them?
What is the performance  I/O impact if we pass start key = end key in
the SlicePredicate? Will it perform better than passing a range as [Start
key,] with count = 1?

Thanks,
Naren


Re: working of get_range_slices

2010-10-14 Thread Narendra Sharma
Thanks Jonathan.

Another related question is if I need to fetch only 1 row then what will be
the difference between the performance of get_slice vs get_range_slices.
The reason for this question is that we are using some code that uses
get_range_slices. We have option of forcing it to use count=1 with
get_range_slices or change the code to use get_slice.

What would you recommend? What will be the net gain on the Cassandra side in
computing the result?

Thanks,
Naren

On Thu, Oct 14, 2010 at 11:12 AM, Jonathan Ellis jbel...@gmail.com wrote:

 get_range_slices never does searching.

 the performance of those two predicates is equivalent, assuming a row
 start key actually exists.

 On Thu, Oct 14, 2010 at 1:09 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  Hi,
 
  I am using Cassandra 0.6.5. Our application uses the get_range_slices to
 get
  rows in the given range.
 
  Could someone please explain how get_range_slices works internally esp
 when
  a count parameter (value = 1) is also specified in the SlicePredicate?
 Does
  Cassandra first search all in the given range and then return top 1 or it
  some how reads only 1 and return them?
  What is the performance  I/O impact if we pass start key = end key
 in
  the SlicePredicate? Will it perform better than passing a range as
 [Start
  key,] with count = 1?
 
  Thanks,
  Naren
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Retaining commit logs

2010-10-06 Thread Narendra Sharma
Cassandra Version: 0.6.5

I am running a long duration test and I need to keep the commit log to see
the sequence of operations to debug few application issues. Is it possible
to retain the commit logs? Apart from increasing the value of
CommitLogRotationThresholdInMB
what is the other way to achieve this? The commit logs are deleted when
Memtable is flushed.

Thanks,
Naren


Re: Query on sstable2json - possible bug

2010-10-06 Thread Narendra Sharma
Has any one used sstable2json on 0.6.5 and noticed the issue I described in
my email below? This doesn't look like data corruption issue as sstablekeys
shows the keys.

Thanks,
Naren


On Tue, Oct 5, 2010 at 8:09 PM, Narendra Sharma
narendra.sha...@gmail.comwrote:

 0.6.5

 -Naren


 On Tue, Oct 5, 2010 at 6:56 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Version?

 On Tue, Oct 5, 2010 at 7:28 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  Hi,
 
  I am using sstable2json to extract row data for debugging some
 application
  issue. I first ran sstablekeys to find the list of keys in the sstable.
 Then
  I use the key to fetch row from sstable. The sstable is from Lucandra
  deployment. I get following.
 
  -bash-3.2$ ./sstablekeys Documents-37-Data.db | more
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
  jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
  jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c
  jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00
  jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
  jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e
 
  //This returns correct data
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
  {
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041356884, false], [5f3a504152454e54,
 
 65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
  1296272041369884, false], [5f3a4944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041342884, false], [efbfbf4d455441efbfbf,
 
 aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
  1296272041458884, false]]
  }
 
  //Look at the key in the json output. It doesn't match the key passed as
  argument
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
  {
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041356884, false], [5f3a504152454e54,
 
 65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
  1296272041369884, false], [5f3a4944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041342884, false], [efbfbf4d455441efbfbf,
 
 aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
  1296272041458884, false]]
  }
  -bash-3.2$
 
  //This returns correct data
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
  {
jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76:
  [[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884,
  false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080,
  1296278215526884, false], [31313a6d73732e6d73672e61727674696d65,
  3132383632363630373180, 1296278215627884, false],
  [31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884,
  false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580,
  1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373,
 3080,
  1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080,
  1296278215679884, false], [31313a6d73732e6d73672e6964,
 
 30346632663464612d373234642d343066312d393562662d34373939623937616465373680,
  1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473,
 80,
  1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363,
 3080,
  1296278215569884, false],
  [31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780,
  1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80,
  1296278215697884, false], [31313a6d73732e6d73672e70726976617465,
  66616c736580, 1296278215592884, false],
 [31313a6d73732e6d73672e73697a65,
  3636383180, 1296278215532884, false],
  [31313a6d73732e6d73672e74696d65317374616363, 3080

Re: Retaining commit logs

2010-10-06 Thread Narendra Sharma
Thanks  Oleg!

Could you please share the patch. I have build Cassandra before from source.
I can definitely give it try.

-Naren

On Wed, Oct 6, 2010 at 3:55 AM, Oleg Anastasyev olega...@gmail.com wrote:

  Is it possible to retain the commit logs?

 In off-the-shelf cassandra 0.6.5 this is not possible, AFAIK.
 I developed a patch we use internally in our company for commit
 log archivation and replay.
 I can share a patch with you, if you dare patching cassandra
 sources by yourself ;-)

 PS. Are other ppl interested in this functionality ?
 I could file it to JIRA as well...






Query on sstable2json - possible bug

2010-10-05 Thread Narendra Sharma
Hi,

I am using sstable2json to extract row data for debugging some application
issue. I first ran sstablekeys to find the list of keys in the sstable. Then
I use the key to fetch row from sstable. The sstable is from Lucandra
deployment. I get following.

-bash-3.2$ ./sstablekeys Documents-37-Data.db | more
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c
jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00
jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e

//This returns correct data
-bash-3.2$ ./sstable2json Documents-38-Data.db -k
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
{
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
1296272041356884, false], [5f3a504152454e54,
65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
1296272041369884, false], [5f3a4944,
30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
1296272041342884, false], [efbfbf4d455441efbfbf,
aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
1296272041458884, false]]
}

//Look at the key in the json output. It doesn't match the key passed as
argument
-bash-3.2$ ./sstable2json Documents-38-Data.db -k
jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
{
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
1296272041356884, false], [5f3a504152454e54,
65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
1296272041369884, false], [5f3a4944,
30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
1296272041342884, false], [efbfbf4d455441efbfbf,
aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
1296272041458884, false]]
}
-bash-3.2$

//This returns correct data
-bash-3.2$ ./sstable2json Documents-38-Data.db -k
jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
{
  jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76:
[[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884,
false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080,
1296278215526884, false], [31313a6d73732e6d73672e61727674696d65,
3132383632363630373180, 1296278215627884, false],
[31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884,
false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580,
1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373, 3080,
1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080,
1296278215679884, false], [31313a6d73732e6d73672e6964,
30346632663464612d373234642d343066312d393562662d34373939623937616465373680,
1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473, 80,
1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363, 3080,
1296278215569884, false],
[31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780,
1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80,
1296278215697884, false], [31313a6d73732e6d73672e70726976617465,
66616c736580, 1296278215592884, false], [31313a6d73732e6d73672e73697a65,
3636383180, 1296278215532884, false],
[31313a6d73732e6d73672e74696d65317374616363, 3080, 1296278215647884,
false], [31313a6d73732e6d73672e74797065, 80, 1296278215685884, false],
[31313a6d73732e6d73672e756964, 3130303480, 1296278215563884, false],
[31313a6d73732e6d73672e756e72656164, 7472756580, 1296278215659884,
false], [31313a6d73732e766572, 3080, 1296278215633884, false],
[5f3a46514944,
30346632663464612d373234642d343066312d393562662d34373939623937616465373680,
1296278215500884, false], [5f3a504152454e54,
62646638666262622d323265392d343830302d623533612d35373032333838303436616680,
1296278215514884, false], [5f3a4944,

Re: Query on sstable2json - possible bug

2010-10-05 Thread Narendra Sharma
0.6.5

-Naren

On Tue, Oct 5, 2010 at 6:56 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Version?

 On Tue, Oct 5, 2010 at 7:28 PM, Narendra Sharma
 narendra.sha...@gmail.com wrote:
  Hi,
 
  I am using sstable2json to extract row data for debugging some
 application
  issue. I first ran sstablekeys to find the list of keys in the sstable.
 Then
  I use the key to fetch row from sstable. The sstable is from Lucandra
  deployment. I get following.
 
  -bash-3.2$ ./sstablekeys Documents-37-Data.db | more
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
  jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
  jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c
  jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00
  jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
  jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e
 
  //This returns correct data
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec
  {
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041356884, false], [5f3a504152454e54,
 
 65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
  1296272041369884, false], [5f3a4944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041342884, false], [efbfbf4d455441efbfbf,
 
 aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
  1296272041458884, false]]
  }
 
  //Look at the key in the json output. It doesn't match the key passed as
  argument
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09
  {
jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041356884, false], [5f3a504152454e54,
 
 65373466316138632d313934652d343939652d383835362d64316536343939613862636180,
  1296272041369884, false], [5f3a4944,
 
 30646635613534612d363164382d343430652d393461392d62343630363162613266656380,
  1296272041342884, false], [efbfbf4d455441efbfbf,
 
 aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078,
  1296272041458884, false]]
  }
  -bash-3.2$
 
  //This returns correct data
  -bash-3.2$ ./sstable2json Documents-38-Data.db -k
  jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76
  {
jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76:
  [[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884,
  false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080,
  1296278215526884, false], [31313a6d73732e6d73672e61727674696d65,
  3132383632363630373180, 1296278215627884, false],
  [31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884,
  false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580,
  1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373,
 3080,
  1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080,
  1296278215679884, false], [31313a6d73732e6d73672e6964,
 
 30346632663464612d373234642d343066312d393562662d34373939623937616465373680,
  1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473,
 80,
  1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363,
 3080,
  1296278215569884, false],
  [31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780,
  1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80,
  1296278215697884, false], [31313a6d73732e6d73672e70726976617465,
  66616c736580, 1296278215592884, false],
 [31313a6d73732e6d73672e73697a65,
  3636383180, 1296278215532884, false],
  [31313a6d73732e6d73672e74696d65317374616363, 3080, 1296278215647884,
  false], [31313a6d73732e6d73672e74797065, 80, 1296278215685884,
 false],
  [31313a6d73732e6d73672e756964, 3130303480, 1296278215563884, false],
  [31313a6d73732e6d73672e756e72656164, 7472756580, 1296278215659884,
  false], [31313a6d73732e766572, 3080

Re: Preventing Swapping.

2010-09-29 Thread Narendra Sharma
Read Use mlockall via JNA, if present, to prevent Linux from swapping out
parts of the JVM https://issues.apache.org/jira/browse/CASSANDRA-1214 on
following link:
http://www.riptano.com/blog/whats-new-cassandra-065

-Naren

On Wed, Sep 29, 2010 at 5:21 PM, Jeremy Davis
jerdavis.cassan...@gmail.comwrote:


 Did anyone else see this article on preventing swapping? Seems like it
 would also apply to Cassandra.


 http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/

 -JD




High number of DigestMismatchException

2010-09-26 Thread Narendra Sharma
We are seeing high number of DigestMismatchException on our Cassandra
deployment. We have a cluster of 4 nodes with RF=3 and we read/write in
Quorum. I understand some DigestMismatchException is normal and is the
mechanism for Cassandra to ensure consistency by doing read-repair.

In our case, even though we have 4 clients, only the client that writes the
data, read the data because of request sharding at client end. So I would
expect the replication to happen fast and data be consistent on the 3 copies
before the read hits the cluster. The size of column value is approx 128K.
We verified multiple times that the timestamp of all the clients is in sync.

Is this something to worry about? How do we troubleshoot if this an issue?


Thanks,
Naren


Cassandra client - clock sync

2010-07-13 Thread Narendra Sharma
Hi,

We have an application that uses Cassandra to store data. The application is
deployed on multiple nodes that are part of an application cluster. We are
at present using single Cassandra node. We have noticed few errors in
application and our analysis revealed that the root cause was that the clock
on different application nodes was off by few miliseconds (approx 3.5 ms).

AFAIK all the application nodes using Cassandra should have clock synched.
Is this understanding correct? If yes, what is the recommended way to keep
the clocks in sync? Even if we use NTP the clocks go out of sync after few
hours. Should we write a cron job to sync time every N minutes or hours?
What is the recommendation in production? How are other Cassandra users
handling the clock sync in production environment?


Thanks,
Naren