Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
In all tables SSTable counts is below 30.

On Thu, Feb 19, 2015 at 9:43 AM, Carlos Rolo r...@pythian.com wrote:

 Can you check how many SSTables you have? It is more or less a know fact
 that 2.1.2 has lots of problems with compaction so a upgrade can solve it.
 But a high number of SSTables can confirm that indeed compaction is your
 problem not something else.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 9:16 AM, Michał Łowicki mlowi...@gmail.com
 wrote:

 We don't have other things running on these boxes and C* is consuming all
 the memory.

 Will try to upgrade to 2.1.3 and if won't help downgrade to 2.1.2.

 —
 Michał


 On Thu, Feb 19, 2015 at 2:39 AM, Jacob Rhoden jacob.rho...@me.com
 wrote:

 Are you tweaking the nice priority on Cassandra? (Type: man nice) if
 you don't know much about it. Certainly improving cassandra's nice score
 becomes important when you have other things running on the server like
 scheduled jobs of people logging in to the server and doing things.

 __
 Sent from iPhone

 On 19 Feb 2015, at 5:28 am, Michał Łowicki mlowi...@gmail.com wrote:

  Hi,

 Couple of times a day 2 out of 4 members cluster nodes are killed

 root@db4:~# dmesg | grep -i oom
 [4811135.792657] [ pid ]   uid  tgid total_vm  rss cpu oom_adj
 oom_score_adj name
 [6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0,
 oom_adj=0, oom_score_adj=0

 Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't
 using row cache.

 Noticed that couple of times a day used RSS is growing really fast
 within couple of minutes and I see CPU spikes at the same time -
 https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0
 .

 Could be related to compaction but after compaction is finished used RSS
 doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of
 64GB) is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb.
 At the time dump was made heap usage is far below 8GB (~3GB) but total RSS
 is ~50GB.

 Any help will be appreciated.

 --
 BR,
 Michał Łowicki




 --






-- 
BR,
Michał Łowicki


Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote:

 So compaction doesn't seem to be your problem (You can check with nodetool
 compactionstats just to be sure).


pending tasks: 0



 How much is your write latency on your column families? I had OOM related
 to this before, and there was a tipping point around 70ms.


Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter.



 --






-- 
BR,
Michał Łowicki


Re: Many pending compactions

2015-02-19 Thread Roland Etzenhammer

Hi,

2.1.3 is now the official latest release - I checked this morning and 
got this good surprise. Now it's update time - thanks to all guys 
involved, if I meet anyone one beer from me :-)


The changelist is rather long:
https://git1-us-west.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3

Hopefully that will solve many of those oddities and not invent to much 
new ones :-)


Cheers,
Roland




Re: Cancel subscription

2015-02-19 Thread Mark Reddy
Please use user-unsubscr...@cassandra.apache.org to unsubscribe from this
mailing list.


Thanks

Regards,
Mark

On 19 February 2015 at 09:14, Hilary Albutt - CEO 
hil...@incrediblesoftwaresolutions.com wrote:

 Cancel subscription



Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Carlos Rolo
So compaction doesn't seem to be your problem (You can check with nodetool
compactionstats just to be sure).

How much is your write latency on your column families? I had OOM related
to this before, and there was a tipping point around 70ms.

-- 


--





Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Carlos Rolo
Do you have trickle_fsync enabled? Try to enable that and see if it solves
your problem, since you are getting out of non-heap memory.

Another question, is always the same nodes that die? Or is 2 out of 4 that
die?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki mlowi...@gmail.com wrote:



 On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote:

 So compaction doesn't seem to be your problem (You can check with
 nodetool compactionstats just to be sure).


 pending tasks: 0



 How much is your write latency on your column families? I had OOM related
 to this before, and there was a tipping point around 70ms.


 Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter.



 --






 --
 BR,
 Michał Łowicki


-- 


--





Re: can't delete tmp file

2015-02-19 Thread 曹志富
Thanks you Roland

--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/

2015-02-19 20:32 GMT+08:00 Roland Etzenhammer r.etzenham...@t-online.de:

 Hi,

 try 2.1.3 - with 2.1.2 this is normal. From the changelog:

 * Make sure we don't add tmplink files to the compaction strategy
 (CASSANDRA-8580)
 * Remove tmplink files for offline compactions (CASSANDRA-8321)

 In most cases they are safe to delete, I did this when the node was down.

 Cheers,
 Roland



Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Carlos Rolo
Then you are probably hitting a bug... Trying to find out in Jira. The bad
news is the fix is only to be released on 2.1.4. Once I find it out I will
post it here.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 12:16 PM, Michał Łowicki mlowi...@gmail.com wrote:

 |trickle_fsync| has been enabled for long time in our settings (just
 noticed):

 trickle_fsync: true

 trickle_fsync_interval_in_kb: 10240

 On Thu, Feb 19, 2015 at 12:12 PM, Michał Łowicki mlowi...@gmail.com
 wrote:



 On Thu, Feb 19, 2015 at 11:02 AM, Carlos Rolo r...@pythian.com wrote:

 Do you have trickle_fsync enabled? Try to enable that and see if it
 solves your problem, since you are getting out of non-heap memory.

 Another question, is always the same nodes that die? Or is 2 out of 4
 that die?


 Always the same nodes. Upgraded to 2.1.3 two hours ago so we'll monitor
 if maybe issue has been fixed there. If not will try to enable
 |tricke_fsync|



 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: 
 *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki mlowi...@gmail.com
 wrote:



 On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rolo r...@pythian.com wrote:

 So compaction doesn't seem to be your problem (You can check with
 nodetool compactionstats just to be sure).


 pending tasks: 0



 How much is your write latency on your column families? I had OOM
 related to this before, and there was a tipping point around 70ms.


 Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter.



 --






 --
 BR,
 Michał Łowicki



 --






 --
 BR,
 Michał Łowicki




 --
 BR,
 Michał Łowicki


-- 


--





Node joining take a long time

2015-02-19 Thread 曹志富
Hi guys:
I have a 20 nodes C* cluster with vnodes,version is 2.1.2. When I add a
node to my cluster,it take a long time ,and somes exists node nodetool
nestats show this:

Mode: NORMAL
Unbootstrap cfe03590-b02a-11e4-95c5-b5f6ad9c7711
/172.19.105.49
Receiving 68 files, 23309801005 bytes total

I want know ,is there some problem with my cluster?
--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/


Re: can't delete tmp file

2015-02-19 Thread Roland Etzenhammer

Hi,

try 2.1.3 - with 2.1.2 this is normal. From the changelog:

* Make sure we don't add tmplink files to the compaction strategy 
(CASSANDRA-8580)

* Remove tmplink files for offline compactions (CASSANDRA-8321)

In most cases they are safe to delete, I did this when the node was down.

Cheers,
Roland


Re: can't delete tmp file

2015-02-19 Thread 曹志富
Just upgrade my cluster to 2.1.3???

--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/

2015-02-19 20:32 GMT+08:00 Roland Etzenhammer r.etzenham...@t-online.de:

 Hi,

 try 2.1.3 - with 2.1.2 this is normal. From the changelog:

 * Make sure we don't add tmplink files to the compaction strategy
 (CASSANDRA-8580)
 * Remove tmplink files for offline compactions (CASSANDRA-8321)

 In most cases they are safe to delete, I did this when the node was down.

 Cheers,
 Roland



Re: can't delete tmp file

2015-02-19 Thread Carlos Rolo
You should upgrade to 2.1.3 for sure.

Check the changelog here:
https://git1-us-west.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3


Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 1:44 PM, 曹志富 cao.zh...@gmail.com wrote:

 Thanks you Roland

 --
 曹志富
 手机:18611121927
 邮箱:caozf.zh...@gmail.com
 微博:http://weibo.com/boliza/

 2015-02-19 20:32 GMT+08:00 Roland Etzenhammer r.etzenham...@t-online.de:

 Hi,

 try 2.1.3 - with 2.1.2 this is normal. From the changelog:

 * Make sure we don't add tmplink files to the compaction strategy
 (CASSANDRA-8580)
 * Remove tmplink files for offline compactions (CASSANDRA-8321)

 In most cases they are safe to delete, I did this when the node was down.

 Cheers,
 Roland




-- 


--





Re: Data tiered compaction and data model question

2015-02-19 Thread Kai Wang
What's the typical size of the data field? Unless it's very large, I don't
think table 2 is a very wide row (10x20x60x24=288000 events/partition at
worst). Plus you only need to store 30 days of data. The over data size is
288000x30=8,640,000 events. I am not even sure if you need C* depending on
event size.

On Thu, Feb 19, 2015 at 12:00 AM, cass savy casss...@gmail.com wrote:

 10-20 per minute is the average. Worstcase can be 10x of avg.

 On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

  What is the maximum number of events that you expect in a day? What is
 the worst-case scenario?



 Mohammed



 *From:* cass savy [mailto:casss...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 4:21 PM
 *To:* user@cassandra.apache.org
 *Subject:* Data tiered compaction and data model question



 We want to track events in log  Cf/table and should be able to query for
 events that occurred in range of mins or hours for given day. Multiple
 events can occur in a given minute.  Listed 2 table designs and leaning
 towards table 1 to avoid large wide row.  Please advice on



 *Table 1*: not very widerow, still be able to query for range of minutes
 for given day

 and/or given day and range of hours

 Create table *log_Event*

 (

  event_day text,

  event_hr int,

  event_time timeuuid,

  data text,

 PRIMARY KEY (* (event_day,event_hr),*event_time)

 )

 *Table 2: This will be very wide row*



 Create table *log_Event*

 ( event_day text,

  event_time timeuuid,

  data text,

 PRIMARY KEY (* event_day,*event_time)

 )



 *Datatiered compaction: recommended for time series data as per below
 doc. Our data will be kept only for 30 days. Hence thought of using this
 compaction strategy.*

 http://www.datastax.com/dev/blog/datetieredcompactionstrategy

 Create table 1 listed above with this compaction strategy. Added some
 rows and did manual flush.  I do not see any sstables created yet. Is that
 expected?

  compaction={'max_sstable_age_days': '1', 'class':
 'DateTieredCompactionStrategy'}







Re: Node joining take a long time

2015-02-19 Thread Mark Reddy
What is a long time in your scenario? What is the data size in your cluster?


I'm sure Rob will be along shortly to say that 2.1.2 is, in his opinion,
broken for production use...an opinion I'd agree with. So bare that in mind
if you are running a production cluster.

Regards,
Mark

On 19 February 2015 at 12:19, 曹志富 cao.zh...@gmail.com wrote:

 Hi guys:
 I have a 20 nodes C* cluster with vnodes,version is 2.1.2. When I add a
 node to my cluster,it take a long time ,and somes exists node nodetool
 nestats show this:

 Mode: NORMAL
 Unbootstrap cfe03590-b02a-11e4-95c5-b5f6ad9c7711
 /172.19.105.49
 Receiving 68 files, 23309801005 bytes total

 I want know ,is there some problem with my cluster?
 --
 曹志富
 手机:18611121927
 邮箱:caozf.zh...@gmail.com
 微博:http://weibo.com/boliza/



Re: run cassandra on a small instance

2015-02-19 Thread Tim Dunphy

 What does your schema look like, your total data size and your read/write
 patterns? Maybe you are simply doing a heavier workload than a small
 instance can handle.


Hi Mark,

 OK well as mentioned this is all test data with almost literally no
workload. So I doubt it's the data and/ or workload that's causing it to
crash on the 2GB instance after 5 hours.

But when I describe the schema with my test data this is what I see:


cqlsh use joke_fire1
   ... ;
cqlsh:joke_fire1 describe schema;

CREATE KEYSPACE joke_fire1 WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'}  AND durable_writes = true;

'module' object has no attribute 'UserTypesMeta'

If I take a look at the size of the total amount of data this is what I see:

[root@beta-new:/etc/alternatives/cassandrahome/data] #du -hs data
17M data


Which includes the system keyspace. But the test data that I created for my
use is only 15MB:

[root@beta-new:/etc/alternatives/cassandrahome/data/data] #du -hs
joke_fire1/
15M joke_fire1/

But just to see if it's my data that could be causing the problem, I tried
removing it all, and setting the IP of the 2GB instance itself as the seed
node. I'll try running that for a while and seeing if it crashes.


Also I tried just installing a plain cassandra 2.1.3 onto a plain CentOS
6.6 instance on the AWS free tier. It's a t.2 micro instance. So far it's
running. I'll keep an eye on both. At this point, I'm thinking that there
might be something about my data that could be causing it to fail after 5
or so hours.

However I might need some help diagnosing the data, as I'm not familiar on
how to do that with cassandra.

Thanks!
Tim

On Thu, Feb 19, 2015 at 3:51 AM, Mark Reddy mark.l.re...@gmail.com wrote:

 What does your schema look like, your total data size and your read/write
 patterns? Maybe you are simply doing a heavier workload than a small
 instance can handle.


 Regards,
 Mark

 On 19 February 2015 at 08:40, Carlos Rolo r...@pythian.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.

 Does Cassandra just fails after some time running or it is failing on
 some specific read/write?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey guys,

 After the upgrade to 2.1.3, and after almost exactly 5 hours running
 cassandra did indeed crash again on the 2GB ram VM.

 This is how the memory on the VM looked after the crash:

 [root@web2:~] #free -m
  total   used   free sharedbuffers cached
 Mem:  2002   1227774  8 45386
 -/+ buffers/cache:794   1207
 Swap:0  0  0


 And that's with this set in the cassandra-env.sh file:

 MAX_HEAP_SIZE=800M
 HEAP_NEWSIZE=200M

 So I'm thinking now, do I just have to abandon this idea I have of
 running Cassandra on a 2GB instance? Or is this something we can all agree
 can be done? And if so, how can we do that? :)

 Thanks
 Tim

 On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA 
 jason.kushm...@wda.com wrote:

 I asked this previously when a similar message came through, with a
 similar response.



 planetcassandra seems to have it “right”, in that stable=2.0,
 development=2.1, whereas the apache site says stable is 2.1.

 “Right” in they assume latest minor version is development.  Why not
 have the apache site do the same?  That’s just my lowly non-contributing
 opinion though.



 *Jason  *



 *From:* Andrew [mailto:redmu...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 8:26 PM
 *To:* Robert Coli; user@cassandra.apache.org
 *Subject:* Re: run cassandra on a small instance



 Robert,



 Let me know if I’m off base about this—but I feel like I see a lot of
 posts that are like this (i.e., use this arbitrary version, not this other
 arbitrary version).  Why are releases going out if they’re “broken”?  This
 seems like a very confusing way for new (and existing) users to approach
 versions...



 Andrew



 On February 18, 2015 at 5:16:27 PM, Robert Coli (rc...@eventbrite.com)
 wrote:

 On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy bluethu...@gmail.com
 wrote:

 I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
 over at Digital Ocean. It's a CentOS 7 host.



 2.1.2 is IMO broken and should not be used for any purpose.



 Use 2.1.1 or 2.1.3.




 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/



 =Rob






 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 --







-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: run cassandra on a small instance

2015-02-19 Thread Robert Coli
On Wed, Feb 18, 2015 at 5:26 PM, Andrew redmu...@gmail.com wrote:

 Let me know if I’m off base about this—but I feel like I see a lot of
 posts that are like this (i.e., use this arbitrary version, not this other
 arbitrary version).  Why are releases going out if they’re “broken”?  This
 seems like a very confusing way for new (and existing) users to approach
 versions...


In my opinion and in no way speaking for or representing Apache Cassandra,
Datastax, or anyone else :

I think it's a problem of messaging, and a mismatch of expectations between
the development team and operators.

I think the stable versions are stable by the dev team's standards, and
not by operators' standards. While testing has historically been IMO
insufficient for a data-store (where correctness really matters) there are
also various issues which probably can not realistically be detected in
testing. Of course, operators need to be willing to operate (ideally in
non-production) near the cutting edge in order to assist in the detection
and resolution of these bugs, but I think the project does itself a
disservice by encouraging noobs to run these versions. You only get one
chance to make a first impression, as the saying goes.

My ideal messaging would probably say something like versions near the
cutting edge should be treated cautiously, conservative operators should
run mature point releases in production and only upgrade to near the
cutting edge after extended burn-in in dev/QA/stage environments.

A fair response to this critique is that operators should know better than
to trust that x.y.0-5 release versions of any open source software are
likely to be production ready, even if the website says stable next to
the download. Trust, but verify?

=Rob


RE: Data tiered compaction and data model question

2015-02-19 Thread Mohammed Guller
Reading 288,000 rows from a partition may cause problems. It is recommended not 
to read more than 100k rows in a partition ((although paging may help). So 
Table 2 may cause issues.

I agree with Kai that for you may not even need C* for this use-case. C* is 
ideal for data with  3 Vs: volume, velocity and variety. It doesn’t look like 
your data has the volume or velocity that a standard RDBMS cannot handle.

Mohammed

From: Kai Wang [mailto:dep...@gmail.com]
Sent: Thursday, February 19, 2015 6:06 AM
To: user@cassandra.apache.org
Subject: Re: Data tiered compaction and data model question

What's the typical size of the data field? Unless it's very large, I don't 
think table 2 is a very wide row (10x20x60x24=288000 events/partition at 
worst). Plus you only need to store 30 days of data. The over data size is 
288000x30=8,640,000 events. I am not even sure if you need C* depending on 
event size.

On Thu, Feb 19, 2015 at 12:00 AM, cass savy 
casss...@gmail.commailto:casss...@gmail.com wrote:
10-20 per minute is the average. Worstcase can be 10x of avg.

On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
What is the maximum number of events that you expect in a day? What is the 
worst-case scenario?

Mohammed

From: cass savy [mailto:casss...@gmail.commailto:casss...@gmail.com]
Sent: Wednesday, February 18, 2015 4:21 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Data tiered compaction and data model question

We want to track events in log  Cf/table and should be able to query for events 
that occurred in range of mins or hours for given day. Multiple events can 
occur in a given minute.  Listed 2 table designs and leaning towards table 1 to 
avoid large wide row.  Please advice on

Table 1: not very widerow, still be able to query for range of minutes for 
given day
and/or given day and range of hours
Create table log_Event
(
 event_day text,
 event_hr int,
 event_time timeuuid,
 data text,
PRIMARY KEY ( (event_day,event_hr),event_time)
)
Table 2: This will be very wide row

Create table log_Event
( event_day text,
 event_time timeuuid,
 data text,
PRIMARY KEY ( event_day,event_time)
)

Datatiered compaction: recommended for time series data as per below doc. Our 
data will be kept only for 30 days. Hence thought of using this compaction 
strategy.
http://www.datastax.com/dev/blog/datetieredcompactionstrategy
Create table 1 listed above with this compaction strategy. Added some rows and 
did manual flush.  I do not see any sstables created yet. Is that expected?
 compaction={'max_sstable_age_days': '1', 'class': 
'DateTieredCompactionStrategy'}





designing table

2015-02-19 Thread Check Peck
I am trying to design a table in Cassandra in which I will have multiple
JSON String for a particular client id.

abc123 -   jsonA
abc123 -   jsonB
abcd12345   -   jsonC
My query pattern is going to be -

Give me all JSON String for a particular client id.
Give me all the client id's and json strings for a particular date.

What is the best way to design table for this?


Re: [ANNOUNCE] Apache Gora 0.6 Released

2015-02-19 Thread Talat Uyarer
Congras!
On Feb 20, 2015 2:59 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com
wrote:

 Hi Folks,

 The Apache Gora team are pleased to announce the immediate availability of
 Apache Gora 0.6.

 This release addresses a modest 47 issues http://s.apache.org/gora-0.6
 with some being major improvements, new functionality and dependency
 upgrades. Most notably the release involves key upgrades to Hadoop, HBase
 and Solr dependencies as well as some extremely important bug fixes for the
 MongoDB module.

 Suggested Gora database support is as follows

- Apache Avro 1.7.6
- Apache Hadoop 1.2.1 and 2.5.2
- Apache HBase 0.98.8-hadoop2
- Apache Cassandra 2.0.2
- Apache Solr 4.10.3
- MongoDB 2.6.X
- Apache Accumlo 1.5.1

 Gora is released as both source code, downloads for which can be found at
 our downloads page http://gora.apache.org/downloads.html as well as
 Maven artifacts which can be found on Maven central
 http://search.maven.org/#search%7Cga%7C1%7Cgora.
 Thank you
 Lewis
 (on behalf of Gora PMC)


 --
 *Lewis*



Why no virtual nodes for Cassandra on EC2?

2015-02-19 Thread Clint Kelly
Hi all,

The guide for installing Cassandra on EC2 says that

Note: The DataStax AMI does not install DataStax Enterprise nodes
with virtual nodes enabled.

http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMI.html

Just curious why this is the case.  It was my understanding that
virtual nodes make taking Cassandra nodes on and offline an easier
process, and that seems like something that an EC2 user would want to
do quite frequently.

-Clint


Re: Data tiered compaction and data model question

2015-02-19 Thread Roland Etzenhammer

Hi Cass,

just a hint from the off - if I got it right you have:

Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)

Assuming your events to write come in by wall clock time, the first 
table design will have a hotspot on a specific node getting all writes 
for a single hour as (event_day,event_hr) is the partioning key. The 
second table design will put this hotspot on a specific node per day as 
event_day is the partitoning key. So please be careful if you have a 
write intensive workload.


I have designed my logging tables with a non datetime key in my 
partioning key to distribute writes to all nodes at a specific point in 
time. I have for example


PRIMARY KEY ((sensor_id,measure_date))

and the timestamp-value pairs in the rows. They are quite wide as I have 
about 1 measurements per sensor and id, but analytics and cleanup 
jobs run daily.


Of course as a not so long time cassandra user I can be wrong, please 
feel free to correct me.


Cheers,
Roland




Re: Node joining take a long time

2015-02-19 Thread 曹志富
So ,what can I do???Waiting for 2.1.4 or upgrade to 2.1.3??

--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/

2015-02-20 3:16 GMT+08:00 Robert Coli rc...@eventbrite.com:

 On Thu, Feb 19, 2015 at 7:34 AM, Mark Reddy mark.l.re...@gmail.com
 wrote:

 I'm sure Rob will be along shortly to say that 2.1.2 is, in his opinion,
 broken for production use...an opinion I'd agree with. So bare that in mind
 if you are running a production cluster.


 If you speak of the devil, he will appear.

 But yes, really, run 2.1.1 or 2.1.3, 2.1.2 is a bummer. Don't take the
 brown 2.1.2.

 This commentary is likely unrelated to the problem the OP is having, which
 I would need the information Mark asked for to comment on. :)

 =Rob




Re: run cassandra on a small instance

2015-02-19 Thread Tim Dunphy

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.
 Does Cassandra just fails after some time running or it is failing on some
 specific read/write?


Hi  Carlos,

Ok, that's really interesting. So I have to ask, did you have to do
anything special to get Cassandra to run on those 1GB AWS instances? I'd
love to do the same. I even tried there as well and failed due to lack of
memory to run it.

And there is no specific reason other than lack of memory that I can tell
for it to fail. And it doesn's seem to matter what data I use either.
Because even if I remove the data directory with rm -rf, the phenomenon is
the same. It'll run for a while, usually about 5 hours and then just crash
with the word 'killed' as the last line of output.

Thanks
Tim


On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo r...@pythian.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.

 Does Cassandra just fails after some time running or it is failing on some
 specific read/write?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey guys,

 After the upgrade to 2.1.3, and after almost exactly 5 hours running
 cassandra did indeed crash again on the 2GB ram VM.

 This is how the memory on the VM looked after the crash:

 [root@web2:~] #free -m
  total   used   free sharedbuffers cached
 Mem:  2002   1227774  8 45386
 -/+ buffers/cache:794   1207
 Swap:0  0  0


 And that's with this set in the cassandra-env.sh file:

 MAX_HEAP_SIZE=800M
 HEAP_NEWSIZE=200M

 So I'm thinking now, do I just have to abandon this idea I have of
 running Cassandra on a 2GB instance? Or is this something we can all agree
 can be done? And if so, how can we do that? :)

 Thanks
 Tim

 On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA 
 jason.kushm...@wda.com wrote:

 I asked this previously when a similar message came through, with a
 similar response.



 planetcassandra seems to have it “right”, in that stable=2.0,
 development=2.1, whereas the apache site says stable is 2.1.

 “Right” in they assume latest minor version is development.  Why not
 have the apache site do the same?  That’s just my lowly non-contributing
 opinion though.



 *Jason  *



 *From:* Andrew [mailto:redmu...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 8:26 PM
 *To:* Robert Coli; user@cassandra.apache.org
 *Subject:* Re: run cassandra on a small instance



 Robert,



 Let me know if I’m off base about this—but I feel like I see a lot of
 posts that are like this (i.e., use this arbitrary version, not this other
 arbitrary version).  Why are releases going out if they’re “broken”?  This
 seems like a very confusing way for new (and existing) users to approach
 versions...



 Andrew



 On February 18, 2015 at 5:16:27 PM, Robert Coli (rc...@eventbrite.com)
 wrote:

 On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy bluethu...@gmail.com
 wrote:

 I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
 over at Digital Ocean. It's a CentOS 7 host.



 2.1.2 is IMO broken and should not be used for any purpose.



 Use 2.1.1 or 2.1.3.




 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/



 =Rob






 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 --






-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: run cassandra on a small instance

2015-02-19 Thread Carlos Rolo
What I normally do is install plain CentOS (Not any AMI build for
Cassandra) and I don't use them for production! I run them for testing,
fire drills and some cassandra-stress benchmarks. I will look if I had more
than 5h Cassandra uptime. I can even put one up now and do the test and get
the results back to you.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Thu, Feb 19, 2015 at 6:41 PM, Tim Dunphy bluethu...@gmail.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.
 Does Cassandra just fails after some time running or it is failing on
 some specific read/write?


 Hi  Carlos,

 Ok, that's really interesting. So I have to ask, did you have to do
 anything special to get Cassandra to run on those 1GB AWS instances? I'd
 love to do the same. I even tried there as well and failed due to lack of
 memory to run it.

 And there is no specific reason other than lack of memory that I can tell
 for it to fail. And it doesn's seem to matter what data I use either.
 Because even if I remove the data directory with rm -rf, the phenomenon is
 the same. It'll run for a while, usually about 5 hours and then just crash
 with the word 'killed' as the last line of output.

 Thanks
 Tim


 On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo r...@pythian.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.

 Does Cassandra just fails after some time running or it is failing on
 some specific read/write?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey guys,

 After the upgrade to 2.1.3, and after almost exactly 5 hours running
 cassandra did indeed crash again on the 2GB ram VM.

 This is how the memory on the VM looked after the crash:

 [root@web2:~] #free -m
  total   used   free sharedbuffers cached
 Mem:  2002   1227774  8 45386
 -/+ buffers/cache:794   1207
 Swap:0  0  0


 And that's with this set in the cassandra-env.sh file:

 MAX_HEAP_SIZE=800M
 HEAP_NEWSIZE=200M

 So I'm thinking now, do I just have to abandon this idea I have of
 running Cassandra on a 2GB instance? Or is this something we can all agree
 can be done? And if so, how can we do that? :)

 Thanks
 Tim

 On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA 
 jason.kushm...@wda.com wrote:

 I asked this previously when a similar message came through, with a
 similar response.



 planetcassandra seems to have it “right”, in that stable=2.0,
 development=2.1, whereas the apache site says stable is 2.1.

 “Right” in they assume latest minor version is development.  Why not
 have the apache site do the same?  That’s just my lowly non-contributing
 opinion though.



 *Jason  *



 *From:* Andrew [mailto:redmu...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 8:26 PM
 *To:* Robert Coli; user@cassandra.apache.org
 *Subject:* Re: run cassandra on a small instance



 Robert,



 Let me know if I’m off base about this—but I feel like I see a lot of
 posts that are like this (i.e., use this arbitrary version, not this other
 arbitrary version).  Why are releases going out if they’re “broken”?  This
 seems like a very confusing way for new (and existing) users to approach
 versions...



 Andrew



 On February 18, 2015 at 5:16:27 PM, Robert Coli (rc...@eventbrite.com)
 wrote:

 On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy bluethu...@gmail.com
 wrote:

 I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
 over at Digital Ocean. It's a CentOS 7 host.



 2.1.2 is IMO broken and should not be used for any purpose.



 Use 2.1.1 or 2.1.3.




 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/



 =Rob






 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 --






 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



-- 


--





Re: run cassandra on a small instance

2015-02-19 Thread Tim Dunphy

 What I normally do is install plain CentOS (Not any AMI build for
 Cassandra) and I don't use them for production! I run them for testing,
 fire drills and some cassandra-stress benchmarks. I will look if I had more
 than 5h Cassandra uptime. I can even put one up now and do the test and get
 the results back to you.


Hey thanks for letting me know that. And yep! Same here. It's just a plain
CentOS 7 VM I've been using. None of this is for production. I also have an
AWS account that I use only for testing. I can try setting it up there to
and get back to you with my results.

Thank you!
Tim

On Thu, Feb 19, 2015 at 12:52 PM, Carlos Rolo r...@pythian.com wrote:

 What I normally do is install plain CentOS (Not any AMI build for
 Cassandra) and I don't use them for production! I run them for testing,
 fire drills and some cassandra-stress benchmarks. I will look if I had more
 than 5h Cassandra uptime. I can even put one up now and do the test and get
 the results back to you.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 6:41 PM, Tim Dunphy bluethu...@gmail.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even) and
 I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.
 Does Cassandra just fails after some time running or it is failing on
 some specific read/write?


 Hi  Carlos,

 Ok, that's really interesting. So I have to ask, did you have to do
 anything special to get Cassandra to run on those 1GB AWS instances? I'd
 love to do the same. I even tried there as well and failed due to lack of
 memory to run it.

 And there is no specific reason other than lack of memory that I can tell
 for it to fail. And it doesn's seem to matter what data I use either.
 Because even if I remove the data directory with rm -rf, the phenomenon is
 the same. It'll run for a while, usually about 5 hours and then just crash
 with the word 'killed' as the last line of output.

 Thanks
 Tim


 On Thu, Feb 19, 2015 at 3:40 AM, Carlos Rolo r...@pythian.com wrote:

 I have Cassandra instances running on VMs with smaller RAM (1GB even)
 and I don't go OOM when testing them. Although I use them in AWS and other
 providers, never tried Digital Ocean.

 Does Cassandra just fails after some time running or it is failing on
 some specific read/write?

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: 
 *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Thu, Feb 19, 2015 at 7:16 AM, Tim Dunphy bluethu...@gmail.com
 wrote:

 Hey guys,

 After the upgrade to 2.1.3, and after almost exactly 5 hours running
 cassandra did indeed crash again on the 2GB ram VM.

 This is how the memory on the VM looked after the crash:

 [root@web2:~] #free -m
  total   used   free sharedbuffers
 cached
 Mem:  2002   1227774  8 45
  386
 -/+ buffers/cache:794   1207
 Swap:0  0  0


 And that's with this set in the cassandra-env.sh file:

 MAX_HEAP_SIZE=800M
 HEAP_NEWSIZE=200M

 So I'm thinking now, do I just have to abandon this idea I have of
 running Cassandra on a 2GB instance? Or is this something we can all agree
 can be done? And if so, how can we do that? :)

 Thanks
 Tim

 On Wed, Feb 18, 2015 at 8:39 PM, Jason Kushmaul | WDA 
 jason.kushm...@wda.com wrote:

 I asked this previously when a similar message came through, with a
 similar response.



 planetcassandra seems to have it “right”, in that stable=2.0,
 development=2.1, whereas the apache site says stable is 2.1.

 “Right” in they assume latest minor version is development.  Why not
 have the apache site do the same?  That’s just my lowly non-contributing
 opinion though.



 *Jason  *



 *From:* Andrew [mailto:redmu...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 8:26 PM
 *To:* Robert Coli; user@cassandra.apache.org
 *Subject:* Re: run cassandra on a small instance



 Robert,



 Let me know if I’m off base about this—but I feel like I see a lot of
 posts that are like this (i.e., use this arbitrary version, not this other
 arbitrary version).  Why are releases going out if they’re “broken”?  This
 seems like a very confusing way for new (and existing) users to approach
 versions...



 Andrew



 On February 18, 2015 at 5:16:27 PM, Robert Coli (rc...@eventbrite.com)
 wrote:

 On Wed, Feb 18, 2015 at 5:09 PM, Tim Dunphy bluethu...@gmail.com
 wrote:

 I'm attempting to run Cassandra 2.1.2 on a smallish 2.GB ram instance
 over at Digital Ocean. It's a CentOS 7 host.



 2.1.2 is IMO broken and should not be used for any purpose.



 Use 2.1.1 or 2.1.3.




 

[ANNOUNCE] Apache Gora 0.6 Released

2015-02-19 Thread Lewis John Mcgibbney
Hi Folks,

The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.6.

This release addresses a modest 47 issues http://s.apache.org/gora-0.6
with some being major improvements, new functionality and dependency
upgrades. Most notably the release involves key upgrades to Hadoop, HBase
and Solr dependencies as well as some extremely important bug fixes for the
MongoDB module.

Suggested Gora database support is as follows

   - Apache Avro 1.7.6
   - Apache Hadoop 1.2.1 and 2.5.2
   - Apache HBase 0.98.8-hadoop2
   - Apache Cassandra 2.0.2
   - Apache Solr 4.10.3
   - MongoDB 2.6.X
   - Apache Accumlo 1.5.1

Gora is released as both source code, downloads for which can be found at
our downloads page http://gora.apache.org/downloads.html as well as Maven
artifacts which can be found on Maven central
http://search.maven.org/#search%7Cga%7C1%7Cgora.
Thank you
Lewis
(on behalf of Gora PMC)


-- 
*Lewis*


Re: Node joining take a long time

2015-02-19 Thread 曹志富
First thank all of you.

Almost three days,till right now the status is still Joining. My cluster
per 650G a node.

--
曹志富
手机:18611121927
邮箱:caozf.zh...@gmail.com
微博:http://weibo.com/boliza/

2015-02-20 3:16 GMT+08:00 Robert Coli rc...@eventbrite.com:

 On Thu, Feb 19, 2015 at 7:34 AM, Mark Reddy mark.l.re...@gmail.com
 wrote:

 I'm sure Rob will be along shortly to say that 2.1.2 is, in his opinion,
 broken for production use...an opinion I'd agree with. So bare that in mind
 if you are running a production cluster.


 If you speak of the devil, he will appear.

 But yes, really, run 2.1.1 or 2.1.3, 2.1.2 is a bummer. Don't take the
 brown 2.1.2.

 This commentary is likely unrelated to the problem the OP is having, which
 I would need the information Mark asked for to comment on. :)

 =Rob