date:20130724

Re: Decommission an entire DC

2013-07-24 Thread Cyril Scetbon

And if we want to add a new DC ? I suppose we should add all nodes and alter 
the replication factor of the keyspace after that, but if anyone can confirm it 
and maybe give me some tips ? 
FYI ,we have 2 DCs with between 10 and 20 nodes in each and a 2To database 
(local replication factor included) 

thanks
-- 
Cyril SCETBON

On Jul 24, 2013, at 12:04 AM, Omar Shibli o...@eyeviewdigital.com wrote:

 All you need to do is to decrease the replication factor of DC1 to 0, and 
 then decommission the nodes one by one,
 I've tried this before and it worked with no issues.
 
 Thanks,
 
 On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.com wrote:
 Hi,
 
 We have a multi-dc setup using DC1:2, DC2:2.  We want to get rid of DC1.  
 We're in the position where we don't need to save any of the data on DC1.  We 
 know we'll lose a (tiny.  already checked) bit of data but our processing is 
 such that we'll recover over time.
 
 How do we drop DC1 and just move forward with DC2?  Using nodetool 
 decommision or removetoken looks like we'll eventually end up with a single 
 DC1 node containing the entire dc's data which would be slow and costly.
 
 We've speculated that setting DC1:0 or removing it from the schema would do 
 the trick but without finding any hits during searching on that idea I 
 hesitate to just do it.  We can drop DC1s data but have to keep a working 
 ring in DC2.

RE: disappointed

2013-07-24 Thread Christopher Wirt

Hi Paul,

 

Sorry to hear you're having a low point.

 

We ended up not using the collection features of 1.2. 

Instead storing a compressed string containing the map and handling client
side.

 

We only have fixed schema short rows so no experience with large row
compaction.

 

File descriptors have never got that high for us. But, if you only have a
couple physical nodes with loads of data and small ss-tables maybe they
could get that high?

 

Only time I've had file descriptors get out of hand was then compaction got
slightly confused with a new schema when I dropped and recreated instead of
truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting
the node fixed the issue.

 

 

From my limited experience I think Cassandra is a dangerous choice for an
young limited funding/experience start-up expecting to scale fast. We are a
fairly mature start-up with funding. We've just spent 3-5 months moving from
Mongo to Cassandra. It's been expensive and painful getting Cassandra to
read like Mongo, but we've made it J

 

 

 

 

From: Paul Ingalls [mailto:paulinga...@gmail.com] 
Sent: 24 July 2013 06:01
To: user@cassandra.apache.org
Subject: disappointed

 

I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2
cluster up and working with my data set for three weeks with no success.
I've been running a 1.1 cluster for 8 months now with no hiccups, but for me
at least 1.2 has been a disaster.  I had high hopes for leveraging the new
features of 1.2, specifically vnodes and collections.   But at this point I
can't release my system into production, and will probably need to find a
new back end.  As a small startup, this could be catastrophic.  I'm mostly
mad at myself.  I took a risk moving to the new tech.  I forgot sometimes
when you gamble, you lose.

 

First, the performance of 1.2.6 was horrible when using collections.  I
wasn't able to push through 500k rows before the cluster became unusable.
With a lot of digging, and way too much time, I discovered I was hitting a
bug that had just been fixed, but was unreleased.  This scared me, because
the release was already at 1.2.6 and I would have expected something as
https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been
addressed long before.  But gamely I grabbed the latest code from the 1.2
branch, built it and I was finally able to get past half a million rows.  

 

But, then I hit ~4 million rows, and a multitude of problems.  Even with the
fix above, I was still seeing a ton of compactions failing, specifically the
ones for large rows.  Not a single large row will compact, they all assert
with the wrong size.  Worse, and this is what kills the whole thing, I keep
hitting a wall with open files, even after dumping the whole DB, dropping
vnodes and trying again.  Seriously, 650k open file descriptors?  When it
hits this limit, the whole DB craps out and is basically unusable.  This
isn't that many rows.  I have close to a half a billion in 1.1.

 

I'm now at a standstill.  I figure I have two options unless someone here
can help me.  Neither of them involve 1.2.  I can either go back to 1.1 and
remove the features that collections added to my service, or I find another
data backend that has similar performance characteristics to cassandra but
allows collections type behavior in a scalable manner.  Cause as far as I
can tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I
accomplished with 1.1..

 

Does anyone know why there are so many open file descriptors?  Any ideas on
why a large row won't compact?

 

Paul

Re: disappointed

2013-07-24 Thread Fabien Rousseau

Hi Paul,

Concerning large rows which are not compacting, I've probably managed to
reproduce your problem.
I suppose you're using collections, but also TTLs ?

Anyway, I opened an issue here :
https://issues.apache.org/jira/browse/CASSANDRA-5799

Hope this helps


2013/7/24 Christopher Wirt chris.w...@struq.com

 Hi Paul,

 ** **

 Sorry to hear you’re having a low point.

 ** **

 We ended up not using the collection features of 1.2. 

 Instead storing a compressed string containing the map and handling client
 side.

 ** **

 We only have fixed schema short rows so no experience with large row
 compaction.

 ** **

 File descriptors have never got that high for us. But, if you only have a
 couple physical nodes with loads of data and small ss-tables maybe they
 could get that high?

 ** **

 Only time I’ve had file descriptors get out of hand was then compaction
 got slightly confused with a new schema when I dropped and recreated
 instead of truncating.
 https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node
 fixed the issue.

 ** **

 ** **

 From my limited experience I think Cassandra is a dangerous choice for an
 young limited funding/experience start-up expecting to scale fast. We are a
 fairly mature start-up with funding. We’ve just spent 3-5 months moving
 from Mongo to Cassandra. It’s been expensive and painful getting Cassandra
 to read like Mongo, but we’ve made it J

 ** **

 ** **

 ** **

 ** **

 *From:* Paul Ingalls [mailto:paulinga...@gmail.com]
 *Sent:* 24 July 2013 06:01
 *To:* user@cassandra.apache.org
 *Subject:* disappointed

 ** **

 I want to check in.  I'm sad, mad and afraid.  I've been trying to get a
 1.2 cluster up and working with my data set for three weeks with no
 success.  I've been running a 1.1 cluster for 8 months now with no hiccups,
 but for me at least 1.2 has been a disaster.  I had high hopes for
 leveraging the new features of 1.2, specifically vnodes and collections.
 But at this point I can't release my system into production, and will
 probably need to find a new back end.  As a small startup, this could be
 catastrophic.  I'm mostly mad at myself.  I took a risk moving to the new
 tech.  I forgot sometimes when you gamble, you lose.

 ** **

 First, the performance of 1.2.6 was horrible when using collections.  I
 wasn't able to push through 500k rows before the cluster became unusable.
  With a lot of digging, and way too much time, I discovered I was hitting a
 bug that had just been fixed, but was unreleased.  This scared me, because
 the release was already at 1.2.6 and I would have expected something as
 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been
 addressed long before.  But gamely I grabbed the latest code from the 1.2
 branch, built it and I was finally able to get past half a million rows.
 

 ** **

 But, then I hit ~4 million rows, and a multitude of problems.  Even with
 the fix above, I was still seeing a ton of compactions failing,
 specifically the ones for large rows.  Not a single large row will compact,
 they all assert with the wrong size.  Worse, and this is what kills the
 whole thing, I keep hitting a wall with open files, even after dumping the
 whole DB, dropping vnodes and trying again.  Seriously, 650k open file
 descriptors?  When it hits this limit, the whole DB craps out and is
 basically unusable.  This isn't that many rows.  I have close to a half a
 billion in 1.1…

 ** **

 I'm now at a standstill.  I figure I have two options unless someone here
 can help me.  Neither of them involve 1.2.  I can either go back to 1.1 and
 remove the features that collections added to my service, or I find another
 data backend that has similar performance characteristics to cassandra but
 allows collections type behavior in a scalable manner.  Cause as far as I
 can tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I
 accomplished with 1.1….

 ** **

 Does anyone know why there are so many open file descriptors?  Any ideas
 on why a large row won't compact?

 ** **

 Paul




-- 
Fabien Rousseau
*
*
 aur...@yakaz.comwww.yakaz.com

Re: disappointed

2013-07-24 Thread Radim Kolar



From my limited experience I think Cassandra is a dangerous choice for 
an young limited funding/experience start-up expecting to scale fast.


Its not dangerous, just do not try to be smart and follow what other big 
cassandra users like twitter, netflix, facebook, etc are using. If they 
are still at 1.1, then do not rush to 1.2. You can get all informations 
you need from github and their maven repos. Same method can be used for 
any other not mainstream software like scala and hadoop.


Also every cassandra new branch comes with extensive number of difficult 
to spot bugs and it takes about 1/2 year to stabilize. Usually new 
features should be avoided. Best is to stay 1 major version behind. This 
is true for almost any mission critical software.


You can help with testing cassandra 2.0 beta. Create your testsuite and 
run it against your target cassandra version. Test suite also needs to 
track performance. From my testing performance of 2.0 is about same as 
1.2 in my workload.


I had lot of problems after i migrated from really good working 0.8.x to 
1.0.5. Even if preproduction testing did not discovered any problems, 
there were memory leaks in 1.0.5, hint delivery was broken and there 
were problem with repair making old tombstones appear causing snowball 
effect. Last one was fixed about 1year later in mainstream C* after i 
fixed it myself because no dev believed me that such thing can happen.

MapReduce response time and speed

2013-07-24 Thread Jan Algermissen

Hi,

I am Jan Algermissen (REST-head, freelance programmer/consultant) and 
Cassandra-newbie.

I am looking at Cassandra for an application I am working on. There will be a 
max. of 10 Million items (Texts and attributes of a retailer's products) in the 
database. There will occasional writes (e.g. price updates).

The use case for the application is to work on the whole data set, item by item 
to produce 'exports'. It will be neccessary to access the full set every time. 
There is no relationship between the items. Processing is done iteratively.

My question: I am thinking that this is an ideal scenario for map-reduce but I 
am unsure about two things:

Can a user of the system define new jobs in an ad-hoc fashion (like a query) or 
do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a 
developer to compile-in the job when you need the perormance of Erlang-based 
jobs).

Suppose a user indeed can specify a job and send it off to Cassandra for 
processing, what is the expected response time?

Is it possible to reduce the response time (by tuning, adding more nodes) to 
make a result available within a couple of minutes? Or will there most 
certainly be a gap of 10 minutes or so and more?

I understand that map-reduce is not for ad-hoc 'querying', but my users expect 
the system to feel quasi-ineractive, because they intend to refine the 
processing job based on the results they get. A short gap would be ok, but a 
definite gap in the order of 10+ minutes not.

(For example, as far as I learned with RIAK you would most certainly have such 
a gap. How about Cassandra? Throwing more nodes at the problem would be ok, I 
just need to understand whether there is a definite 'response time penalty' I 
have to expect no matter what)

Jan

Cassandra and RAIDs

2013-07-24 Thread Jan Algermissen

Hi,

second question:

is it recommended to set up Cassandra using 'RAID-ed' disks for per-node 
reliability or do people usually just rely on having the multiple nodes anyway 
- why bother with replicated disks?

Jan

Re: Cassandra and RAIDs

2013-07-24 Thread Andrew Cobley

From:

http://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning

  *   RAID on data disks: It is generally not necessary to use RAID for the 
following reasons:

  *   Data is replicated across the cluster based on the replication factor 
you've chosen.
  *   Starting in version 1.2, Cassandra includes takes care of disk management 
with the JBOD (Just a bunch of disks) support feature. Because Cassandra 
properly reacts to a disk failure, based on your availability/consistency 
requirements, either by stopping the affected node or by blacklisting the 
failed drive, this allows you to deploy Cassandra nodes with large disk arrays 
without the overhead of RAID 10.

  *   RAID on the commit log disk: Generally RAID is not needed for the commit 
log disk. Replication adequately prevents data loss. If you need the extra 
redundancy, use RAID 1.


Andy

On 24 Jul 2013, at 15:36, Jan Algermissen 
jan.algermis...@nordsc.commailto:jan.algermis...@nordsc.com wrote:

Hi,

second question:

is it recommended to set up Cassandra using 'RAID-ed' disks for per-node 
reliability or do people usually just rely on having the multiple nodes anyway 
- why bother with replicated disks?

Jan


The University of Dundee is a registered Scottish Charity, No: SC015096

Re: Cassandra and RAIDs

2013-07-24 Thread Richard Low

On 24 July 2013 15:36, Jan Algermissen jan.algermis...@nordsc.com wrote:


 is it recommended to set up Cassandra using 'RAID-ed' disks for per-node
 reliability or do people usually just rely on having the multiple nodes
 anyway - why bother with replicated disks?


It's not necessary, due to replication as you say.  You can give Cassandra
your JBOD disks and it will split data between them and avoid a disk (or
fail the node, you can choose) if one fails.

There are some reasons to consider RAID though:

* It is probably quicker and places no load on the rest of the cluster to
do a RAID rebuild rather than a nodetool rebuild/repaid.  The importance of
this depends on how much data you have and the load on your cluster.  If
you don't have much data per node or if there is spare capacity then RAID
will offer no benefit here.
* Using JBOD, the largest SSTable you can have is limited to the size of
one disk.  This is unlikely to cause problems in most scenarios but an
erroneous nodetool compact could cause problems if your data size is
greater than can fit on any one disk.

Richard.

Re: MapReduce response time and speed

2013-07-24 Thread Shahab Yunus

You have lot of questions there so I can't answer all but for the following:
*Can a user of the system define new jobs in an ad-hoc fashion (like a
query) or do map reduce jobs need to be prepared by a developer (e.g. in
RIAK you do a developer to compile-in the job when you need the perormance
of Erlang-based jobs).

Suppose a user indeed can specify a job and send it off to Cassandra for
processing, what is the expected response time?*

You can use high-level tools like Pig, Hive and Oozie But mind you, it will
depend on your data size, complexity of the job, cluster and tune
parameters.

Regards,
Shahab


On Wed, Jul 24, 2013 at 10:33 AM, Jan Algermissen 
jan.algermis...@nordsc.com wrote:

 Hi,

 I am Jan Algermissen (REST-head, freelance programmer/consultant) and
 Cassandra-newbie.

 I am looking at Cassandra for an application I am working on. There will
 be a max. of 10 Million items (Texts and attributes of a retailer's
 products) in the database. There will occasional writes (e.g. price
 updates).

 The use case for the application is to work on the whole data set, item by
 item to produce 'exports'. It will be neccessary to access the full set
 every time. There is no relationship between the items. Processing is done
 iteratively.

 My question: I am thinking that this is an ideal scenario for map-reduce
 but I am unsure about two things:

 Can a user of the system define new jobs in an ad-hoc fashion (like a
 query) or do map reduce jobs need to be prepared by a developer (e.g. in
 RIAK you do a developer to compile-in the job when you need the perormance
 of Erlang-based jobs).

 Suppose a user indeed can specify a job and send it off to Cassandra for
 processing, what is the expected response time?

 Is it possible to reduce the response time (by tuning, adding more nodes)
 to make a result available within a couple of minutes? Or will there most
 certainly be a gap of 10 minutes or so and more?

 I understand that map-reduce is not for ad-hoc 'querying', but my users
 expect the system to feel quasi-ineractive, because they intend to refine
 the processing job based on the results they get. A short gap would be ok,
 but a definite gap in the order of 10+ minutes not.

 (For example, as far as I learned with RIAK you would most certainly have
 such a gap. How about Cassandra? Throwing more nodes at the problem would
 be ok, I just need to understand whether there is a definite 'response time
 penalty' I have to expect no matter what)

 Jan

Re: disappointed

2013-07-24 Thread Paul Ingalls

Same type of error, but I'm not currently using TTL's.  I am, however, 
generating a lot of tombstones as I add elements to collections….


On Jul 24, 2013, at 6:42 AM, Fabien Rousseau fab...@yakaz.com wrote:

 Hi Paul,
 
 Concerning large rows which are not compacting, I've probably managed to 
 reproduce your problem.
 I suppose you're using collections, but also TTLs ?
 
 Anyway, I opened an issue here : 
 https://issues.apache.org/jira/browse/CASSANDRA-5799 
 
 Hope this helps
 
 
 2013/7/24 Christopher Wirt chris.w...@struq.com
 Hi Paul,
 
  
 
 Sorry to hear you’re having a low point.
 
  
 
 We ended up not using the collection features of 1.2.
 
 Instead storing a compressed string containing the map and handling client 
 side.
 
  
 
 We only have fixed schema short rows so no experience with large row 
 compaction.
 
  
 
 File descriptors have never got that high for us. But, if you only have a 
 couple physical nodes with loads of data and small ss-tables maybe they could 
 get that high?
 
  
 
 Only time I’ve had file descriptors get out of hand was then compaction got 
 slightly confused with a new schema when I dropped and recreated instead of 
 truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting 
 the node fixed the issue.
 
  
 
  
 
 From my limited experience I think Cassandra is a dangerous choice for an 
 young limited funding/experience start-up expecting to scale fast. We are a 
 fairly mature start-up with funding. We’ve just spent 3-5 months moving from 
 Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read 
 like Mongo, but we’ve made it J
 
  
 
  
 
  
 
  
 
 From: Paul Ingalls [mailto:paulinga...@gmail.com] 
 Sent: 24 July 2013 06:01
 To: user@cassandra.apache.org
 Subject: disappointed
 
  
 
 I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2 
 cluster up and working with my data set for three weeks with no success.  
 I've been running a 1.1 cluster for 8 months now with no hiccups, but for me 
 at least 1.2 has been a disaster.  I had high hopes for leveraging the new 
 features of 1.2, specifically vnodes and collections.   But at this point I 
 can't release my system into production, and will probably need to find a new 
 back end.  As a small startup, this could be catastrophic.  I'm mostly mad at 
 myself.  I took a risk moving to the new tech.  I forgot sometimes when you 
 gamble, you lose.
 
  
 
 First, the performance of 1.2.6 was horrible when using collections.  I 
 wasn't able to push through 500k rows before the cluster became unusable.  
 With a lot of digging, and way too much time, I discovered I was hitting a 
 bug that had just been fixed, but was unreleased.  This scared me, because 
 the release was already at 1.2.6 and I would have expected something as 
 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been 
 addressed long before.  But gamely I grabbed the latest code from the 1.2 
 branch, built it and I was finally able to get past half a million rows.  
 
  
 
 But, then I hit ~4 million rows, and a multitude of problems.  Even with the 
 fix above, I was still seeing a ton of compactions failing, specifically the 
 ones for large rows.  Not a single large row will compact, they all assert 
 with the wrong size.  Worse, and this is what kills the whole thing, I keep 
 hitting a wall with open files, even after dumping the whole DB, dropping 
 vnodes and trying again.  Seriously, 650k open file descriptors?  When it 
 hits this limit, the whole DB craps out and is basically unusable.  This 
 isn't that many rows.  I have close to a half a billion in 1.1…
 
  
 
 I'm now at a standstill.  I figure I have two options unless someone here can 
 help me.  Neither of them involve 1.2.  I can either go back to 1.1 and 
 remove the features that collections added to my service, or I find another 
 data backend that has similar performance characteristics to cassandra but 
 allows collections type behavior in a scalable manner.  Cause as far as I can 
 tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I 
 accomplished with 1.1….
 
  
 
 Does anyone know why there are so many open file descriptors?  Any ideas on 
 why a large row won't compact?
 
  
 
 Paul
 
 
 
 
 -- 
 Fabien Rousseau
 
 
 www.yakaz.com

unsubscribe

2013-07-24 Thread crigano


 

 http://wiki.apache.org/cassandra/FAQ#unsubscribe






unsubscribe

Re: disappointed

2013-07-24 Thread Paul Ingalls

Hi Chris,

Thanks for the response!

What kind of challenges did you run into that kept you from using collections?

I currently and running 4 physical nodes, same as I was with case 1.1.6.  I'm 
using size tiered compaction.  Would changing to level tiered with a large 
minimum make a big difference, or would it just push the problem off till later?

Yeah, I have run into problems dropping schemas before as well.  I was careful 
this time to start with an empty db folder…

Glad you were successful in your transition…:)

Paul

On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com wrote:

 Hi Paul,
  
 Sorry to hear you’re having a low point.
  
 We ended up not using the collection features of 1.2.
 Instead storing a compressed string containing the map and handling client 
 side.
  
 We only have fixed schema short rows so no experience with large row 
 compaction.
  
 File descriptors have never got that high for us. But, if you only have a 
 couple physical nodes with loads of data and small ss-tables maybe they could 
 get that high?
  
 Only time I’ve had file descriptors get out of hand was then compaction got 
 slightly confused with a new schema when I dropped and recreated instead of 
 truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting 
 the node fixed the issue.
  
  
 From my limited experience I think Cassandra is a dangerous choice for an 
 young limited funding/experience start-up expecting to scale fast. We are a 
 fairly mature start-up with funding. We’ve just spent 3-5 months moving from 
 Mongo to Cassandra. It’s been expensive and painful getting Cassandra to read 
 like Mongo, but we’ve made it J
  
  
  
  
 From: Paul Ingalls [mailto:paulinga...@gmail.com] 
 Sent: 24 July 2013 06:01
 To: user@cassandra.apache.org
 Subject: disappointed
  
 I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2 
 cluster up and working with my data set for three weeks with no success.  
 I've been running a 1.1 cluster for 8 months now with no hiccups, but for me 
 at least 1.2 has been a disaster.  I had high hopes for leveraging the new 
 features of 1.2, specifically vnodes and collections.   But at this point I 
 can't release my system into production, and will probably need to find a new 
 back end.  As a small startup, this could be catastrophic.  I'm mostly mad at 
 myself.  I took a risk moving to the new tech.  I forgot sometimes when you 
 gamble, you lose.
  
 First, the performance of 1.2.6 was horrible when using collections.  I 
 wasn't able to push through 500k rows before the cluster became unusable.  
 With a lot of digging, and way too much time, I discovered I was hitting a 
 bug that had just been fixed, but was unreleased.  This scared me, because 
 the release was already at 1.2.6 and I would have expected something as 
 https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been 
 addressed long before.  But gamely I grabbed the latest code from the 1.2 
 branch, built it and I was finally able to get past half a million rows.  
  
 But, then I hit ~4 million rows, and a multitude of problems.  Even with the 
 fix above, I was still seeing a ton of compactions failing, specifically the 
 ones for large rows.  Not a single large row will compact, they all assert 
 with the wrong size.  Worse, and this is what kills the whole thing, I keep 
 hitting a wall with open files, even after dumping the whole DB, dropping 
 vnodes and trying again.  Seriously, 650k open file descriptors?  When it 
 hits this limit, the whole DB craps out and is basically unusable.  This 
 isn't that many rows.  I have close to a half a billion in 1.1…
  
 I'm now at a standstill.  I figure I have two options unless someone here can 
 help me.  Neither of them involve 1.2.  I can either go back to 1.1 and 
 remove the features that collections added to my service, or I find another 
 data backend that has similar performance characteristics to cassandra but 
 allows collections type behavior in a scalable manner.  Cause as far as I can 
 tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I 
 accomplished with 1.1….
  
 Does anyone know why there are so many open file descriptors?  Any ideas on 
 why a large row won't compact?
  
 Paul

Re: disappointed

2013-07-24 Thread Paul Ingalls

Hey Radim,

I knew that it would take a while to stabilize, which is why I waited 1/2 a 
year before giving it a go.  I guess I was just surprised that 6 months wasn't 
long enough…

I'll have to look at the differences between 1.2 and 2.0.  Is there a good 
resource for checking that?

Your experience is less than encouraging…:)  I am worried that if I stick with 
it, I'll have to invest time into learning the code base as well, and as a 
small startup time is our most valuable resource…

Thanks for the thoughts!

Paul

On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com wrote:

 
 From my limited experience I think Cassandra is a dangerous choice for an 
 young limited funding/experience start-up expecting to scale fast.
 
 Its not dangerous, just do not try to be smart and follow what other big 
 cassandra users like twitter, netflix, facebook, etc are using. If they are 
 still at 1.1, then do not rush to 1.2. You can get all informations you need 
 from github and their maven repos. Same method can be used for any other not 
 mainstream software like scala and hadoop.
 
 Also every cassandra new branch comes with extensive number of difficult to 
 spot bugs and it takes about 1/2 year to stabilize. Usually new features 
 should be avoided. Best is to stay 1 major version behind. This is true for 
 almost any mission critical software.
 
 You can help with testing cassandra 2.0 beta. Create your testsuite and run 
 it against your target cassandra version. Test suite also needs to track 
 performance. From my testing performance of 2.0 is about same as 1.2 in my 
 workload.
 
 I had lot of problems after i migrated from really good working 0.8.x to 
 1.0.5. Even if preproduction testing did not discovered any problems, there 
 were memory leaks in 1.0.5, hint delivery was broken and there were problem 
 with repair making old tombstones appear causing snowball effect. Last one 
 was fixed about 1year later in mainstream C* after i fixed it myself because 
 no dev believed me that such thing can happen.

RE: disappointed

2013-07-24 Thread Christopher Wirt

We found the performance of collections to not be great and needed a quick
solution.

 

We've always used the levelled compaction strategy where you declare a
sstable_size_in_mb not min_compaction_threshold. Much better for our use
case.

http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

We are read-heavy latency sensitive people

Lots of TTL'ing

Few writes compared to reads.

 

 

From: Paul Ingalls [mailto:paulinga...@gmail.com] 
Sent: 24 July 2013 17:43
To: user@cassandra.apache.org
Subject: Re: disappointed

 

Hi Chris,

 

Thanks for the response!

 

What kind of challenges did you run into that kept you from using
collections?

 

I currently and running 4 physical nodes, same as I was with case 1.1.6.
I'm using size tiered compaction.  Would changing to level tiered with a
large minimum make a big difference, or would it just push the problem off
till later?

 

Yeah, I have run into problems dropping schemas before as well.  I was
careful this time to start with an empty db folder.

 

Glad you were successful in your transition.:)

 

Paul

 

On Jul 24, 2013, at 4:12 AM, Christopher Wirt chris.w...@struq.com
wrote:





Hi Paul,

 

Sorry to hear you're having a low point.

 

We ended up not using the collection features of 1.2.

Instead storing a compressed string containing the map and handling client
side.

 

We only have fixed schema short rows so no experience with large row
compaction.

 

File descriptors have never got that high for us. But, if you only have a
couple physical nodes with loads of data and small ss-tables maybe they
could get that high?

 

Only time I've had file descriptors get out of hand was then compaction got
slightly confused with a new schema when I dropped and recreated instead of
truncating.  https://issues.apache.org/jira/browse/CASSANDRA-4857
https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node
fixed the issue.

 

 

From my limited experience I think Cassandra is a dangerous choice for an
young limited funding/experience start-up expecting to scale fast. We are a
fairly mature start-up with funding. We've just spent 3-5 months moving from
Mongo to Cassandra. It's been expensive and painful getting Cassandra to
read like Mongo, but we've made it J

 

 

 

 

From: Paul Ingalls [mailto:paulinga...@gmail.com] 
Sent: 24 July 2013 06:01
To: user@cassandra.apache.org
Subject: disappointed

 

I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2
cluster up and working with my data set for three weeks with no success.
I've been running a 1.1 cluster for 8 months now with no hiccups, but for me
at least 1.2 has been a disaster.  I had high hopes for leveraging the new
features of 1.2, specifically vnodes and collections.   But at this point I
can't release my system into production, and will probably need to find a
new back end.  As a small startup, this could be catastrophic.  I'm mostly
mad at myself.  I took a risk moving to the new tech.  I forgot sometimes
when you gamble, you lose.

 

First, the performance of 1.2.6 was horrible when using collections.  I
wasn't able to push through 500k rows before the cluster became unusable.
With a lot of digging, and way too much time, I discovered I was hitting a
bug that had just been fixed, but was unreleased.  This scared me, because
the release was already at 1.2.6 and I would have expected something as
https://issues.apache.org/jira/browse/CASSANDRA-5677
https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been
addressed long before.  But gamely I grabbed the latest code from the 1.2
branch, built it and I was finally able to get past half a million rows.  

 

But, then I hit ~4 million rows, and a multitude of problems.  Even with the
fix above, I was still seeing a ton of compactions failing, specifically the
ones for large rows.  Not a single large row will compact, they all assert
with the wrong size.  Worse, and this is what kills the whole thing, I keep
hitting a wall with open files, even after dumping the whole DB, dropping
vnodes and trying again.  Seriously, 650k open file descriptors?  When it
hits this limit, the whole DB craps out and is basically unusable.  This
isn't that many rows.  I have close to a half a billion in 1.1.

 

I'm now at a standstill.  I figure I have two options unless someone here
can help me.  Neither of them involve 1.2.  I can either go back to 1.1 and
remove the features that collections added to my service, or I find another
data backend that has similar performance characteristics to cassandra but
allows collections type behavior in a scalable manner.  Cause as far as I
can tell, 1.2 doesn't scale.  Which makes me sad, I was proud of what I
accomplished with 1.1..

 

Does anyone know why there are so many open file descriptors?  Any ideas on
why a large row won't compact?

 

Paul

unsubscribe

2013-07-24 Thread Marjana Ivkovic

unsubscribe

Re: unable to compact large rows

2013-07-24 Thread Jason Wee

Would it possible to delete this row and reinsert this row? By the way, how
large is that one row?

Jason


On Wed, Jul 24, 2013 at 9:23 AM, Paul Ingalls paulinga...@gmail.com wrote:

 I'm getting constant exceptions during compaction of large rows.  In fact,
 I have not seen one work, even starting from an empty DB.  As soon as I
 start pushing in data, when a row hits the large threshold, it fails
 compaction with this type of stack trace:

  INFO [CompactionExecutor:6] 2013-07-24 01:17:53,592
 CompactionController.java (line 156) Compacting large row
 fanzo/tweets_by_id:352567939972603904 (153360688 bytes) incrementally
 ERROR [CompactionExecutor:6] 2013-07-24 01:18:12,496 CassandraDaemon.java
 (line 192) Exception in thread Thread[CompactionExecutor:6,1,main]
 java.lang.AssertionError: incorrect row data size 5722610 written to
 /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_id/fanzo-tweets_by_id-tmp-ic-1453-Data.db;
 correct is 5767384
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
 at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)

 I'm not sure what to do or where to look.  Help…:)

 Thanks,

 Paul

Re: disappointed

2013-07-24 Thread Steffen Rusitschka

Same thing here... Since #5677 seems to affect a lot of users what do 
you think about releasing a version 1.2.6.1? I can patch myself, yeah, 
but do I want to push this into production? Hmm...


Am 24.07.2013 18:58, schrieb Paul Ingalls:

Hey Radim,

I knew that it would take a while to stabilize, which is why I waited
1/2 a year before giving it a go.  I guess I was just surprised that 6
months wasn't long enough…

I'll have to look at the differences between 1.2 and 2.0.  Is there a
good resource for checking that?

Your experience is less than encouraging…:)  I am worried that if I
stick with it, I'll have to invest time into learning the code base as
well, and as a small startup time is our most valuable resource…

Thanks for the thoughts!

Paul

On Jul 24, 2013, at 6:42 AM, Radim Kolar h...@filez.com
mailto:h...@filez.com wrote:




From my limited experience I think Cassandra is a dangerous choice
for an young limited funding/experience start-up expecting to scale
fast.


Its not dangerous, just do not try to be smart and follow what other
big cassandra users like twitter, netflix, facebook, etc are using. If
they are still at 1.1, then do not rush to 1.2. You can get all
informations you need from github and their maven repos. Same method
can be used for any other not mainstream software like scala and hadoop.

Also every cassandra new branch comes with extensive number of
difficult to spot bugs and it takes about 1/2 year to stabilize.
Usually new features should be avoided. Best is to stay 1 major
version behind. This is true for almost any mission critical software.

You can help with testing cassandra 2.0 beta. Create your testsuite
and run it against your target cassandra version. Test suite also
needs to track performance. From my testing performance of 2.0 is
about same as 1.2 in my workload.

I had lot of problems after i migrated from really good working 0.8.x
to 1.0.5. Even if preproduction testing did not discovered any
problems, there were memory leaks in 1.0.5, hint delivery was broken
and there were problem with repair making old tombstones appear
causing snowball effect. Last one was fixed about 1year later in
mainstream C* after i fixed it myself because no dev believed me that
such thing can happen.





--
Steffen Rusitschka
CTO
MegaZebra GmbH
Steinsdorfstraße 2
81538 München
Phone +49 89 80929577


r...@megazebra.com
Challenge me at www.megazebra.com

MegaZebra GmbH
Geschäftsführer: Henning Kosmack, Christian Meister, Steffen Rusitschka
Sitz der Gesellschaft: München, HRB 177947

Re: disappointed

2013-07-24 Thread Robert Coli

On Wed, Jul 24, 2013 at 11:37 AM, Steffen Rusitschka r...@megazebra.comwrote:

 Same thing here... Since #5677 seems to affect a lot of users what do you
 think about releasing a version 1.2.6.1? I can patch myself, yeah, but do I
 want to push this into production? Hmm...


A better solution would likely involve not running cutting edge code in
production.. if you find yourself needing to upgrade production anything on
the day of a release, you are probably ahead of the version it is
reasonable to run in production.

If you're already comfortable with this high level of risk in production, I
don't really see small manual patches as significantly increasing your
level of risk...

=Rob

Re: Decommission an entire DC

2013-07-24 Thread Lanny Ripple

That one is documented --
http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

On Wed, Jul 24, 2013 at 3:33 AM, Cyril Scetbon cyril.scet...@free.frwrote:

And if we want to add a new DC ? I suppose we should add all nodes and
alter the replication factor of the keyspace after that, but if anyone can
confirm it and maybe give me some tips ?
FYI ,we have 2 DCs with between 10 and 20 nodes in each and a 2To database
(local replication factor included)

thanks
--
Cyril SCETBON

On Jul 24, 2013, at 12:04 AM, Omar Shibli o...@eyeviewdigital.com wrote:

All you need to do is to decrease the replication factor of DC1 to 0, and
then decommission the nodes one by one,
I've tried this before and it worked with no issues.

Thanks,

On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.comwrote:

Hi,

We have a multi-dc setup using DC1:2, DC2:2. We want to get rid of DC1.
We're in the position where we don't need to save any of the data on DC1.
We know we'll lose a (tiny. already checked) bit of data but our
processing is such that we'll recover over time.

How do we drop DC1 and just move forward with DC2? Using nodetool
decommision or removetoken looks like we'll eventually end up with a single
DC1 node containing the entire dc's data which would be slow and costly.

We've speculated that setting DC1:0 or removing it from the schema would
do the trick but without finding any hits during searching on that idea I
hesitate to just do it. We can drop DC1s data but have to keep a working
ring in DC2.

Re: disappointed

2013-07-24 Thread Radim Kolar


cas 2.0b2
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.0.0-beta2-tentative

 and as a small startup time is our most valuable resource…
use technology you are most familiar with.

Data disappear immediately after reading?

2013-07-24 Thread cbert...@libero.it

Hi all, 
I know the subject is not saying much but this is what I'm experiencing now 
with my cluster.
After some years without any problem now I'm experiencing problems with 
counters but, the most serious problem, is data loss immediately after a read.

I have some webservices that I use to query data on Cassandra but in the last 
month happened 2 times the following problem: I call my WS, it shows data. I 
refresh the page -- data are no more available! I can call then 200 times the 
WS but I won't see data anymore ... today my colleague experienced the same 
problem. The WS are ABSOLUTELY read only on the DB and there are no write to 
erase these data. Anyone understand wth is going on? I have no idea but most of 
all I don't know how to fix.

Any help would really be appreciated.

Kind Regards,
Carlo

R: Data disappear immediately after reading?

2013-07-24 Thread cbert...@libero.it

Sorry I forgot to tell

Apache Cassandra 1.07 on Ubuntu 10.04
The data that are disappearing are not Counters but common Rows

Messaggio originale
Da: cbert...@libero.it
Data: 24/07/2013 22.34
A: user@cassandra.apache.org
Ogg: Data disappear immediately after reading?

Hi all, 
I know the subject is not saying much but this is what I'm experiencing now 
with my cluster.
After some years without any problem now I'm experiencing problems with 
counters but, the most serious problem, is data loss immediately after a 
read.

I have some webservices that I use to query data on Cassandra but in the 
last 
month happened 2 times the following problem: I call my WS, it shows data. I 
refresh the page -- data are no more available! I can call then 200 times 
the 
WS but I won't see data anymore ... today my colleague experienced the same 
problem. The WS are ABSOLUTELY read only on the DB and there are no write to 
erase these data. Anyone understand wth is going on? I have no idea but most 
of 
all I don't know how to fix.

Any help would really be appreciated.

Kind Regards,
Carlo

Re: Data disappear immediately after reading?

2013-07-24 Thread Alexis Rodríguez

Carlo,

Do you read/write with the consistency levels according to your needs [1]?

Have you tried to see if it happens when using the cassandra-cli to get
that data?


[1] http://wiki.apache.org/cassandra/ArchitectureOverview


On Wed, Jul 24, 2013 at 5:34 PM, cbert...@libero.it cbert...@libero.itwrote:

 Hi all,
 I know the subject is not saying much but this is what I'm experiencing now
 with my cluster.
 After some years without any problem now I'm experiencing problems with
 counters but, the most serious problem, is data loss immediately after a
 read.

 I have some webservices that I use to query data on Cassandra but in the
 last
 month happened 2 times the following problem: I call my WS, it shows data.
 I
 refresh the page -- data are no more available! I can call then 200 times
 the
 WS but I won't see data anymore ... today my colleague experienced the same
 problem. The WS are ABSOLUTELY read only on the DB and there are no write
 to
 erase these data. Anyone understand wth is going on? I have no idea but
 most of
 all I don't know how to fix.

 Any help would really be appreciated.

 Kind Regards,
 Carlo

Re: Data disappear immediately after reading?

2013-07-24 Thread Robert Coli

On Wed, Jul 24, 2013 at 1:34 PM, cbert...@libero.it cbert...@libero.itwrote:

 After some years without any problem now I'm experiencing problems with
 [not-actually-counters] but, the most serious problem, is data loss
 immediately after a read.


Are secondary indexes involved? There are various bugs (including in 1.0.7)
which have similar symptoms..

=Rob

NPE during compaction in compare

2013-07-24 Thread Paul Ingalls

Hey Chris,

so I just tried dropping all my data and converting my column families to use 
leveled compaction.  Now I'm getting exceptions like the following once I start 
inserting data.  Have you seen these?



ERROR 13:13:25,616 Exception in thread Thread[CompactionExecutor:34,1,main]
java.lang.NullPointerException
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:69)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:396)
at 
org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:205)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180)
at 
org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:40)
at 
org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:51)
at 
org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:46)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:98)
at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
 
and

ERROR 13:17:11,327 Exception in thread Thread[CompactionExecutor:45,1,main]
java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:396)
at 
org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:205)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180)
at 
org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:40)
at 
org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:51)
at 
org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:46)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:98)
at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

Re: unable to compact large rows

2013-07-24 Thread Paul Ingalls

It is pretty much every row that hits the large threshold.  I don't think I can 
delete every row that hits that…

you can see the db size in the stack trace, do you want a different type of 
size?

On Jul 24, 2013, at 11:07 AM, Jason Wee peich...@gmail.com wrote:

 Would it possible to delete this row and reinsert this row? By the way, how 
 large is that one row?
 
 Jason
 
 
 On Wed, Jul 24, 2013 at 9:23 AM, Paul Ingalls paulinga...@gmail.com wrote:
 I'm getting constant exceptions during compaction of large rows.  In fact, I 
 have not seen one work, even starting from an empty DB.  As soon as I start 
 pushing in data, when a row hits the large threshold, it fails compaction 
 with this type of stack trace:
 
  INFO [CompactionExecutor:6] 2013-07-24 01:17:53,592 
 CompactionController.java (line 156) Compacting large row 
 fanzo/tweets_by_id:352567939972603904 (153360688 bytes) incrementally
 ERROR [CompactionExecutor:6] 2013-07-24 01:18:12,496 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor:6,1,main]
 java.lang.AssertionError: incorrect row data size 5722610 written to 
 /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_id/fanzo-tweets_by_id-tmp-ic-1453-Data.db;
  correct is 5767384
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 
 I'm not sure what to do or where to look.  Help…:)
 
 Thanks,
 
 Paul

Re: sstable size change

2013-07-24 Thread Keith Wright

Hi all,

   This morning I increased the SSTable size for one of my LCS via an alter 
command and saw at least one compaction run (I did not trigger a compaction via 
nodetool nor upgrades stables nor removing the .json file).  But so far my data 
sizes appear the same at the default 5 MB (see below for output of ls –Sal as 
well as relevant portion of cfstats).   Is this expected?  I was hoping to see 
at least one file at the new 256 MB size I set.

Thanks

SSTable count: 4965
SSTables in each level: [0, 10, 112/100, 1027/1000, 3816, 0, 0, 0]
Space used (live): 29062393142
Space used (total): 29140547702
Number of Keys (estimate): 195103104
Memtable Columns Count: 441483
Memtable Data Size: 205486218
Memtable Switch Count: 243
Read Count: 154226729

-rw-rw-r--  1 cassandra cassandra 5247564 Jul 18 01:33 
users-shard_user_lookup-ib-97153-Data.db
-rw-rw-r--  1 cassandra cassandra 5247454 Jul 23 02:59 
users-shard_user_lookup-ib-109063-Data.db
-rw-rw-r--  1 cassandra cassandra 5247421 Jul 20 14:58 
users-shard_user_lookup-ib-103127-Data.db
-rw-rw-r--  1 cassandra cassandra 5247415 Jul 17 13:56 
users-shard_user_lookup-ib-95761-Data.db
-rw-rw-r--  1 cassandra cassandra 5247379 Jul 21 02:44 
users-shard_user_lookup-ib-104718-Data.db
-rw-rw-r--  1 cassandra cassandra 5247346 Jul 21 21:54 
users-shard_user_lookup-ib-106280-Data.db
-rw-rw-r--  1 cassandra cassandra 5247242 Jul  3 19:41 
users-shard_user_lookup-ib-66049-Data.db
-rw-rw-r--  1 cassandra cassandra 5247235 Jul 21 02:44 
users-shard_user_lookup-ib-104737-Data.db
-rw-rw-r--  1 cassandra cassandra 5247233 Jul 20 14:58 
users-shard_user_lookup-ib-103169-Data.db


From: sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, July 23, 2013 3:04 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: sstable size change

Will Cassandra force any newly compacted files to my new setting as 
compactions are naturally triggered
Yes. Let it compact and increase in size.


On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright 
kwri...@nanigans.commailto:kwri...@nanigans.com wrote:
Can you elaborate on what you mean by let it take its own course organically? 
 Will Cassandra force any newly compacted files to my new setting as 
compactions are naturally triggered?

You see, when two (or more!) SSTables love each other very much, they sometimes 
decide they want to compact together..

But seriously, yes. If you force all existing SSTables to level 0, it is as 
if you just flushed them all. Level compaction then does a whole lot of 
compaction, using the active table size.

=Rob

Re: sstable size change

2013-07-24 Thread Wei Zhu

what is output of show keyspaces from cassandra-cli, did you see the new value?

  Compaction Strategy: 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy
      Compaction Strategy Options:
        sstable_size_in_mb: XXX



 From: Keith Wright kwri...@nanigans.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Wednesday, July 24, 2013 3:44 PM
Subject: Re: sstable size change
 


Hi all,

   This morning I increased the SSTable size for one of my LCS via an alter 
command and saw at least one compaction run (I did not trigger a compaction via 
nodetool nor upgrades stables nor removing the .json file).  But so far my data 
sizes appear the same at the default 5 MB (see below for output of ls –Sal as 
well as relevant portion of cfstats).   Is this expected?  I was hoping to see 
at least one file at the new 256 MB size I set.

Thanks

SSTable count: 4965
SSTables in each level: [0, 10, 112/100, 1027/1000, 3816, 0, 0, 0]
Space used (live): 29062393142
Space used (total): 29140547702
Number of Keys (estimate): 195103104
Memtable Columns Count: 441483
Memtable Data Size: 205486218
Memtable Switch Count: 243
Read Count: 154226729

-rw-rw-r--  1 cassandra cassandra 5247564 Jul 18 01:33 
users-shard_user_lookup-ib-97153-Data.db
-rw-rw-r--  1 cassandra cassandra 5247454 Jul 23 02:59 
users-shard_user_lookup-ib-109063-Data.db
-rw-rw-r--  1 cassandra cassandra 5247421 Jul 20 14:58 
users-shard_user_lookup-ib-103127-Data.db
-rw-rw-r--  1 cassandra cassandra 5247415 Jul 17 13:56 
users-shard_user_lookup-ib-95761-Data.db
-rw-rw-r--  1 cassandra cassandra 5247379 Jul 21 02:44 
users-shard_user_lookup-ib-104718-Data.db
-rw-rw-r--  1 cassandra cassandra 5247346 Jul 21 21:54 
users-shard_user_lookup-ib-106280-Data.db
-rw-rw-r--  1 cassandra cassandra 5247242 Jul  3 19:41 
users-shard_user_lookup-ib-66049-Data.db
-rw-rw-r--  1 cassandra cassandra 5247235 Jul 21 02:44 
users-shard_user_lookup-ib-104737-Data.db
-rw-rw-r--  1 cassandra cassandra 5247233 Jul 20 14:58 
users-shard_user_lookup-ib-103169-Data.db

From:  sankalp kohli kohlisank...@gmail.com
Reply-To:  user@cassandra.apache.org user@cassandra.apache.org
Date:  Tuesday, July 23, 2013 3:04 PM
To:  user@cassandra.apache.org user@cassandra.apache.org
Subject:  Re: sstable size change


Will Cassandra force any newly compacted files to my new setting as 
compactions are naturally triggered 
Yes. Let it compact and increase in size. 



On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli rc...@eventbrite.com wrote:

On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright kwri...@nanigans.com wrote:

Can you elaborate on what you mean by let it take its own course 
organically?  Will Cassandra force any newly compacted files to my new 
setting as compactions are naturally triggered?


You see, when two (or more!) SSTables love each other very much, they 
sometimes decide they want to compact together..


But seriously, yes. If you force all existing SSTables to level 0, it is as 
if you just flushed them all. Level compaction then does a whole lot of 
compaction, using the active table size.


=Rob

Re: How to avoid inter-dc read requests

2013-07-24 Thread aaron morton

That does not measure what the servers are doing though.

Track the number of reads per CF, it's exposed with nodetool cfstats and is in 
ops centre as well. 

Cheers 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/07/2013, at 12:22 AM, Omar Shibli o...@eyeviewdigital.com wrote:

 I simply monitor the load avg of the nodes using opscenter.
 I started with idle nodes (by idle I mean load avg of all nodes  1.0), then 
 started to run a lot of key slice read requests on analytic DC with CL 
 local quorum (I also made sure that the client worked with only with analytic 
 DC), after a few minutes I noticed that the load avg of all the nodes 
 increased dramatically (10).
 
 Thanks in Advance Aaron,
 
 On Tue, Jul 23, 2013 at 12:02 PM, aaron morton aa...@thelastpickle.com 
 wrote:
  All the read/write request are issued with CL local quorum, but still 
  there're a lot of inter-dc read request.
 
 How are you measuring this ?
 
 Cheers
 
 -
 Aaron Morton
 Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/07/2013, at 8:41 AM, sankalp kohli kohlisank...@gmail.com wrote:
 
  Slice query does not trigger background read repair.
  Implement Read Repair on Range Queries
 
 
  On Sun, Jul 21, 2013 at 1:40 PM, sankalp kohli kohlisank...@gmail.com 
  wrote:
  There can be multiple reasons for that
  1) Background read repairs.
  2) Your data is not consistent and leading to read repairs.
  3) For writes, irrespective of the consistency used, a single write request 
  will goto other DC
  4) You might be running other nodetools commands like repair.
  read_repair_chance¶
 
  (Default: 0.1 or 1) Specifies the probability with which read repairs 
  should be invoked on non-quorum reads. The value must be between 0 and 1. 
  For tables created in versions of Cassandra before 1.0, it defaults to 1. 
  For tables created in versions of Cassandra 1.0 and higher, it defaults to 
  0.1. However, for Cassandra 1.0, the default is 1.0 if you use CLI or any 
  Thrift client, such as Hector or pycassa, and is 0.1 if you use CQL.
 
 
 
  On Sun, Jul 21, 2013 at 10:26 AM, Omar Shibli o...@eyeviewdigital.com 
  wrote:
  One more thing, I'm doing a lot of key slice read requests, is that 
  supposed to change anything?
 
 
  On Sun, Jul 21, 2013 at 8:21 PM, Omar Shibli o...@eyeviewdigital.com 
  wrote:
  I'm seeing a lot of inter-dc read requests, although I've followed DataStax 
  guidelines for multi-dc deployment 
  http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
 
  Here is my setup:
  2 data centers within the same region (AWS)
  Targeting DC, RP 3, 6 nodes
  Analytic DC, RP 3, 11 nodes
 
  All the read/write request are issued with CL local quorum, but still 
  there're a lot of inter-dc read request.
  Any suggestion, or am I missing something?
 
  Thanks in advance,

Re: NPE in CompactionExecutor

2013-07-24 Thread aaron morton

 There was no error stack, just that line in the log.
It's odd that the stack is not there. 

This is an unhanded exception when running compaction. It may be related to the 
assertions. 

If you can reproduce it please raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/07/2013, at 3:50 AM, Paul Ingalls paulinga...@gmail.com wrote:

 I'm running the latest from the 1.2 branch as of a few days ago.  I needed 
 one of the patches that will be in 1.2.7
 
 There was no error stack, just that line in the log.
 
 I wiped the database (deleted all the files in the lib dir) and restarted my 
 data load, and am consistently running into  the incorrect row data size 
 error, almost immediately…  It seems to be specific to compacting large rows. 
  I have been unsuccessful in getting a large row to compact…
 
 Paul
 
 On Jul 21, 2013, at 1:42 PM, aaron morton aa...@thelastpickle.com wrote:
 
 What version are you running ? 
 
 ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
 java.lang.NullPointerException
 What' the full error stack ? 
 
 Not sure if this is related or not, but I'm also getting a bunch of 
 AssertionErrors as well, even after running a scrub…
 
 ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
 java.lang.AssertionError: incorrect row data size 29502477 written to 
 /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db;
  correct is 29725806
 Double check that the scrub was successful. 
 
 If it's not detecting / fixing the problem look for previous log messages 
 from that thread  [CompactionExecutor:38] and see what sstables it was 
 compacting. Try remove those. But I would give scrub another chance to get 
 it sorted. 
 
 Cheers
 
 -
 Aaron Morton
 Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/07/2013, at 5:04 AM, Paul Ingalls paulinga...@gmail.com wrote:
 
 I'm seeing a number of NullPointerExceptions in the log of my cluster.  You 
 can see the log line below.  I'm thinking this is probably bad.  Any ideas?
 
 ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
 java.lang.NullPointerException
 
 Not sure if this is related or not, but I'm also getting a bunch of 
 AssertionErrors as well, even after running a scrub…
 
 ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java 
 (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
 java.lang.AssertionError: incorrect row data size 29502477 written to 
 /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db;
  correct is 29725806
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)

Re: funnel analytics, how to query for reports etc.

2013-07-24 Thread aaron morton

 Too bad Rainbird isn't open sourced yet!
It's been 2 years, I would not hold your breath. 

Remembered there are two time series open source projects out there
https://github.com/deanhiller/databus
https://github.com/Pardot/Rhombus

Cheers


-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/07/2013, at 4:00 AM, S Ahmed sahmed1...@gmail.com wrote:

 Thanks Aaron.
 
 Too bad Rainbird isn't open sourced yet!
 
 
 On Tue, Jul 23, 2013 at 4:48 AM, aaron morton aa...@thelastpickle.com wrote:
 For background on rollup analytics:
 
 Twitter Rainbird  
 http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
 Acunu http://www.acunu.com/
 
 Cheers
 
 -
 Aaron Morton
 Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/07/2013, at 1:03 AM, Vladimir Prudnikov v.prudni...@gmail.com wrote:
 
  This can be done easily,
 
  Use normal column family to store the sequence of events where key is 
  session #ID identifying one use interaction with a website, column names 
  are TimeUUID values and column value id of the event (do not write 
  something like user added product to shopping cart, something shorter 
  identifying this event).
 
  Then you can use counter column family to store counters, you can count 
  anything, number of sessions, total number of events, number of particular 
  events etc. One row per day for example. Then you can retrieve this row and 
  calculate all required %.
 
 
  On Sun, Jul 21, 2013 at 1:05 AM, S Ahmed sahmed1...@gmail.com wrote:
  Would cassandra be a good choice for creating a funnel analytics type 
  product similar to mixpanel?
 
  e.g.  You create a set of events and store them in cassandra for things 
  like:
 
  event#1 user visited product page
  event#2 user added product to shopping cart
  event#3 user clicked on checkout page
  event#4 user filled out cc information
  event#5 user purchased product
 
  Now in my web application I track each user and store the events somehow in 
  cassandra (in some column family etc)
 
  Now how will I pull a report that produces results like:
 
  70% of people added to shopping cart
  20% checkout page
  10% filled out cc information
  4% purchased the product
 
 
  And this is for a Saas, so this report would be for thousands of customers 
  in theory.
 
 
 
  --
  Vladimir Prudnikov

Re: high write load, with lots of updates, considerations? tomestombed data coming back to life

2013-07-24 Thread aaron morton

I was watching some videos from the C* summit 2013 and I recall many people
saying that if you can some up with a design where you don't preform updates
on rows, that would make things easier (I believe it was because there would
be less compaction).
No entirely true.

There will always be compaction. But if you do updates there are overwrites
which means there is data on disk that is irrelevant and is not released until
compaction get's to those files.

Could old tomestombed data somehow come back to life? I forget what scenerio
brings about old data (kinda scary!).
If you don't run repair on every node every gc_grace_seconds there is a chance
of it happening.

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/07/2013, at 4:22 AM, S Ahmed sahmed1...@gmail.com wrote:

When building an Analytics (time series) app on top of C*, based on Twitters
Rainbird design
(http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011),
this means there will be lots and lots of counters.

With lots of counters (updates), admin wise, what are some things to consider?

Could old tomestombed data somehow come back to life? I forget what scenerio
brings about old data (kinda scary!).

Re: get all row keys of a table using CQL3

2013-07-24 Thread aaron morton

 I guess my question #1 still there, that does this query create a big load on 
 the initial node that receive such request because it still has to wait for 
 all the result coming back from other nodes before returning to client?
sort of. 
The coordinator always has to wait. Only one node will return the actual data, 
the others will return a digest of the data. So there is not a huge memory 
pressure for this type of read. 

In general though you should page the results to reduce the size of the read. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/07/2013, at 5:57 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 hi Blake,
 arh okay, token function is nice.
  
 But I am still bit confused by the word page through all rows
 select id from mytable where token(id)  token(12345)
 it will return all rows whose partition key's corresponding token that is  
 12345 ?
 I guess my question #1 still there, that does this query create a big load on 
 the initial node that receive such request because it still has to wait for 
 all the result coming back from other nodes before returning to client?
  
 thanks
  
  
  
 
 
 On Tue, Jul 23, 2013 at 10:34 PM, Blake Eggleston bl...@grapheffect.com 
 wrote:
 Hi Jimmy,
 
 Check out the token function:
 
 http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results
 
 You can use it to page through your rows.
 
 Blake
 
 
 On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:
 
 hi,
 I want to fetch all the row keys of a table using CQL3:
  
 e.g
 select id from mytable limit 999
  
  
 #1
 For this query, does the node need to wait for all rows return from all 
 other nodes before returning the data to the client(I am using astyanax) ?
 In other words, will this operation create a lot of load to the initial node 
 receiving the request?
  
  
 #2
 if my table is big, I have to make sure the limit is set to a big enough 
 number, such that I can get all the result. Seems like I have to do a 
 count(*) to be sure
 is there any alternative(always return all the rows)?
  
 #3
 if my id is a timeuuid, is it better to  combine the result from couple of 
 the following cql to obtain all keys?
 e.g
 select id from mytable where id t  minTimeuuid('2013-02-02 10:00+') 
 limit 2
 +
 select id from mytable where id t  maxTimeuuid('2013-02-02 10:00+') 
 limit 2
  
 thanks

Re: MapReduce response time and speed

2013-07-24 Thread aaron morton

 Is it possible to reduce the response time (by tuning, adding more nodes) to 
 make a result available within a couple of minutes? Or will there most 
 certainly be a gap of 10 minutes or so and more?
Yes. 
More nodes will split the task up and it will run faster. 

How long it takes depends on the complexity of the hadoop tasks and the time 
they have to wait for slots. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/07/2013, at 4:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 You have lot of questions there so I can't answer all but for the following:
 Can a user of the system define new jobs in an ad-hoc fashion (like a query) 
 or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do 
 a developer to compile-in the job when you need the perormance of 
 Erlang-based jobs).
 
 Suppose a user indeed can specify a job and send it off to Cassandra for 
 processing, what is the expected response time?
 
 You can use high-level tools like Pig, Hive and Oozie But mind you, it will 
 depend on your data size, complexity of the job, cluster and tune parameters.
 
 Regards,
 Shahab
 
 
 On Wed, Jul 24, 2013 at 10:33 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
 Hi,
 
 I am Jan Algermissen (REST-head, freelance programmer/consultant) and 
 Cassandra-newbie.
 
 I am looking at Cassandra for an application I am working on. There will be a 
 max. of 10 Million items (Texts and attributes of a retailer's products) in 
 the database. There will occasional writes (e.g. price updates).
 
 The use case for the application is to work on the whole data set, item by 
 item to produce 'exports'. It will be neccessary to access the full set every 
 time. There is no relationship between the items. Processing is done 
 iteratively.
 
 My question: I am thinking that this is an ideal scenario for map-reduce but 
 I am unsure about two things:
 
 Can a user of the system define new jobs in an ad-hoc fashion (like a query) 
 or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do 
 a developer to compile-in the job when you need the perormance of 
 Erlang-based jobs).
 
 Suppose a user indeed can specify a job and send it off to Cassandra for 
 processing, what is the expected response time?
 
 Is it possible to reduce the response time (by tuning, adding more nodes) to 
 make a result available within a couple of minutes? Or will there most 
 certainly be a gap of 10 minutes or so and more?
 
 I understand that map-reduce is not for ad-hoc 'querying', but my users 
 expect the system to feel quasi-ineractive, because they intend to refine the 
 processing job based on the results they get. A short gap would be ok, but a 
 definite gap in the order of 10+ minutes not.
 
 (For example, as far as I learned with RIAK you would most certainly have 
 such a gap. How about Cassandra? Throwing more nodes at the problem would be 
 ok, I just need to understand whether there is a definite 'response time 
 penalty' I have to expect no matter what)
 
 Jan

Re: Data disappear immediately after reading?

2013-07-24 Thread aaron morton

What sort of read are you making to get the data ?

 There was a bug about secondary indexes being dropped if TTL was used 
https://issues.apache.org/jira/browse/CASSANDRA-5079

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/07/2013, at 8:36 AM, cbert...@libero.it wrote:

 Sorry I forgot to tell
 
 Apache Cassandra 1.07 on Ubuntu 10.04
 The data that are disappearing are not Counters but common Rows
 
 Messaggio originale
 Da: cbert...@libero.it
 Data: 24/07/2013 22.34
 A: user@cassandra.apache.org
 Ogg: Data disappear immediately after reading?
 
 Hi all, 
 I know the subject is not saying much but this is what I'm experiencing now 
 with my cluster.
 After some years without any problem now I'm experiencing problems with 
 counters but, the most serious problem, is data loss immediately after a 
 read.
 
 I have some webservices that I use to query data on Cassandra but in the 
 last 
 month happened 2 times the following problem: I call my WS, it shows data. I 
 refresh the page -- data are no more available! I can call then 200 times 
 the 
 WS but I won't see data anymore ... today my colleague experienced the same 
 problem. The WS are ABSOLUTELY read only on the DB and there are no write to 
 erase these data. Anyone understand wth is going on? I have no idea but most 
 of 
 all I don't know how to fix.
 
 Any help would really be appreciated.
 
 Kind Regards,
 Carlo

Re: Decommission an entire DC

RE: disappointed

Re: disappointed

Re: disappointed

MapReduce response time and speed

Cassandra and RAIDs

Re: Cassandra and RAIDs

Re: Cassandra and RAIDs

Re: MapReduce response time and speed

Re: disappointed

unsubscribe

Re: disappointed

Re: disappointed

RE: disappointed

unsubscribe

Re: unable to compact large rows

Re: disappointed

Re: disappointed

Re: Decommission an entire DC

Re: disappointed

Data disappear immediately after reading?

R: Data disappear immediately after reading?

Re: Data disappear immediately after reading?

Re: Data disappear immediately after reading?

NPE during compaction in compare

Re: unable to compact large rows

Re: sstable size change

Re: sstable size change

Re: How to avoid inter-dc read requests

Re: NPE in CompactionExecutor

Re: funnel analytics, how to query for reports etc.

Re: high write load, with lots of updates, considerations? tomestombed data coming back to life

Re: get all row keys of a table using CQL3

Re: MapReduce response time and speed

Re: Data disappear immediately after reading?

35 matches

Site Navigation

Mail list logo

Footer information