Re: Cassandra, vnodes, and spark

2014-09-16 Thread DuyHai Doan
Look into the source code of the Spark connector. CassandraRDD try to find
all token ranges (even when using vnodes) for each node (endpoint) and
create RDD partition to match this distribution of token ranges. Thus data
locality is guaranteed

On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe eric.pl...@gmail.com wrote:

 Interesting. The way I understand the spark connector is that it's
 basically a client executing a cql query and filling a spark rdd. Spark
 will then handle the partitioning of data. Again, this is my understanding,
 and it maybe incorrect.


 On Monday, September 15, 2014, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe eric.pl...@gmail.com wrote:

 Based on this stackoverflow question, vnodes effect the number of
 mappers Hadoop needs to spawn. Which in then affect performance.

 With the spark connector for cassandra would the same situation happen?
 Would vnodes affect performance in a similar situation to Hadoop?


 I don't know what specifically Spark does here, but if it has the same
 locality expectations as Hadoop generally, my belief would be : yes.

 =Rob




Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema

Hi.

As I see massive data processing tools (map\reduce) with C* data include

connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector 
https://github.com/datastax/spark-cassandra-connector

- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for data 
loading. In best case logic instance (some job) run on the same node 
(wherethe corresponding range was found).


Why this logic can`t use direct C* IO (sstable reading from disk)? Any 
cons ?


Some time ago i read article (still can't find it) about academical 
research within Hadoop was modified to support this direct IO mode. 
According to that benchmarks direct IOgave a significant performance 
increase.


Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread DuyHai Doan
If you access directly the C* sstables from those frameworks, you will:

1) miss live data which are in memory and not dumped yet to disk

2) skip the Dynamo layer of C* responsible for data consistency
Le 16 sept. 2014 10:58, platon.tema platon.t...@yandex.ru a écrit :

 Hi.

 As I see massive data processing tools (map\reduce) with C* data include

 connectors
 - Calliope http://tuplejump.github.io/calliope/
 - Datastax spark cassandra connector https://github.com/datastax/
 spark-cassandra-connector
 - Startio Deep https://github.com/Stratio/stratio-deep
 - other free\commercial

 runtime (job management and infrastructure)
 - Spark
 - Hadoop

 But if I'm not mistaken all these solutions use network for data loading.
 In best case logic instance (some job) run on the same node (wherethe
 corresponding range was found).

 Why this logic can`t use direct C* IO (sstable reading from disk)? Any
 cons ?

 Some time ago i read article (still can't find it) about academical
 research within Hadoop was modified to support this direct IO mode.
 According to that benchmarks direct IOgave a significant performance
 increase.



Document of WRITETIME function needs update

2014-09-16 Thread ziju feng
Hi,

I found that the WRITETIME function on counter column returns date/time in
milliseconds instead of microseconds, which is not mentioned in the document
http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_writetime.html.
It will be helpful to clarify the difference in the document.

One side question: I denormalize the counter column value to regular tables
using read-after-write in QUORUM consistency from counter table and update
the regular tables using counter column's write time to resolve write
conflict. Is this a valid use case?

Thanks,

Ziju.


Re: Cassandra, vnodes, and spark

2014-09-16 Thread George Stergiou
Run into this performance report

https://github.com/datastax/spark-cassandra-connector/issues/200

Does spark connector in its current state issue one CQL per vnode or task
per vnode?

Regards.

On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Look into the source code of the Spark connector. CassandraRDD try to find
 all token ranges (even when using vnodes) for each node (endpoint) and
 create RDD partition to match this distribution of token ranges. Thus data
 locality is guaranteed

 On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe eric.pl...@gmail.com wrote:

 Interesting. The way I understand the spark connector is that it's
 basically a client executing a cql query and filling a spark rdd. Spark
 will then handle the partitioning of data. Again, this is my understanding,
 and it maybe incorrect.


 On Monday, September 15, 2014, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe eric.pl...@gmail.com
 wrote:

 Based on this stackoverflow question, vnodes effect the number of
 mappers Hadoop needs to spawn. Which in then affect performance.

 With the spark connector for cassandra would the same situation happen?
 Would vnodes affect performance in a similar situation to Hadoop?


 I don't know what specifically Spark does here, but if it has the same
 locality expectations as Hadoop generally, my belief would be : yes.

 =Rob





Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema

Thanks.

But 1) overcomes with C* API for commitlog and memtables or with mixed 
access (direct IO + traditional connectors or pure CQL if data model 
allows, we experimented with it).


2) is more complex for universal solution. In our case C* uses without 
replication (RF=1) because of huge data size (replication too expensive).


On 09/16/2014 03:40 PM, DuyHai Doan wrote:


If you access directly the C* sstables from those frameworks, you will:

1) miss live data which are in memory and not dumped yet to disk

2) skip the Dynamo layer of C* responsible for data consistency

Le 16 sept. 2014 10:58, platon.tema platon.t...@yandex.ru 
mailto:platon.t...@yandex.ru a écrit :


Hi.

As I see massive data processing tools (map\reduce) with C* data
include

connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector
https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for data
loading. In best case logic instance (some job) run on the same
node (wherethe corresponding range was found).

Why this logic can`t use direct C* IO (sstable reading from disk)?
Any cons ?

Some time ago i read article (still can't find it) about
academical research within Hadoop was modified to support this
direct IO mode. According to that benchmarks direct IOgave a
significant performance increase.





RE: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread moshe.kranc
You will also have to read/resolve multiple row instances (if you update 
records) and tombstones (if you delete records) yourself.

From: platon.tema [mailto:platon.t...@yandex.ru]
Sent: Tuesday, September 16, 2014 1:51 PM
To: user@cassandra.apache.org
Subject: Re: Direct IO with Spark and Hadoop over Cassandra

Thanks.

But 1) overcomes with C* API for commitlog and memtables or with mixed access 
(direct IO + traditional connectors or pure CQL if data model allows, we 
experimented with it).

2) is more complex for universal solution. In our case C* uses without 
replication (RF=1) because of huge data size (replication too expensive).
On 09/16/2014 03:40 PM, DuyHai Doan wrote:

If you access directly the C* sstables from those frameworks, you will:

1) miss live data which are in memory and not dumped yet to disk

2) skip the Dynamo layer of C* responsible for data consistency
Le 16 sept. 2014 10:58, platon.tema 
platon.t...@yandex.rumailto:platon.t...@yandex.ru a écrit :
Hi.

As I see massive data processing tools (map\reduce) with C* data include

connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector 
https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for data loading. In 
best case logic instance (some job) run on the same node (wherethe 
corresponding range was found).

Why this logic can`t use direct C* IO (sstable reading from disk)? Any cons ?

Some time ago i read article (still can't find it) about academical research 
within Hadoop was modified to support this direct IO mode. According to that 
benchmarks direct IOgave a significant performance increase.


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___


Re: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread platon.tema
Yes, updates and deletes is trouble. At the moment for updates 
collection we refresh result data by query to C* (java driver) before 
reporting to user. For deletes we can skip it during scanning by TTL for 
example (not tested yet).


On 09/16/2014 04:53 PM, moshe.kr...@barclays.com wrote:


You will also have to read/resolve multiple row instances (if you 
update records) and tombstones (if you delete records) yourself.


*From:*platon.tema [mailto:platon.t...@yandex.ru]
*Sent:* Tuesday, September 16, 2014 1:51 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Direct IO with Spark and Hadoop over Cassandra

Thanks.

But 1) overcomes with C* API for commitlog and memtables or with mixed 
access (direct IO + traditional connectors or pure CQL if data model 
allows, we experimented with it).


2) is more complex for universal solution. In our case C* uses without 
replication (RF=1) because of huge data size (replication too expensive).


On 09/16/2014 03:40 PM, DuyHai Doan wrote:

If you access directly the C* sstables from those frameworks, you
will:

1) miss live data which are in memory and not dumped yet to disk

2) skip the Dynamo layer of C* responsible for data consistency

Le 16 sept. 2014 10:58, platon.tema platon.t...@yandex.ru
mailto:platon.t...@yandex.ru a écrit :

Hi.

As I see massive data processing tools (map\reduce) with C* data
include

connectors
- Calliope http://tuplejump.github.io/calliope/
- Datastax spark cassandra connector
https://github.com/datastax/spark-cassandra-connector
- Startio Deep https://github.com/Stratio/stratio-deep
- other free\commercial

runtime (job management and infrastructure)
- Spark
- Hadoop

But if I'm not mistaken all these solutions use network for data
loading. In best case logic instance (some job) run on the same
node (wherethe corresponding range was found).

Why this logic can`t use direct C* IO (sstable reading from disk)?
Any cons ?

Some time ago i read article (still can't find it) about
academical research within Hadoop was modified to support this
direct IO mode. According to that benchmarks direct IOgave a
significant performance increase.

___

This message is for information purposes only, it is not a 
recommendation, advice, offer or solicitation to buy or sell a product 
or service nor an official confirmation of any transaction. It is 
directed at persons who are professionals and is not intended for 
retail customer use. Intended for recipient only. This message is 
subject to the terms at: www.barclays.com/emaildisclaimer 
http://www.barclays.com/emaildisclaimer.


For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer 
http://www.barclays.com/salesandtradingdisclaimer regarding market 
commentary from Barclays Sales and/or Trading, who are active market 
participants; and in respect of Barclays Research, including 
disclosures relating to specific issuers, please see 
http://publicresearch.barclays.com.


___





Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread Chris Lohfink
How much memory does your system have? How much memory is system utilizing 
before starting Cassandra (use command free)? What are the heap setting it 
tries to use?

Chris

On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote:

 It's during the startup. I tried to upgrade cassandra from 2.0.7 to 2.0.10, 
 but looks like cassandra could not start again. Also I found the following 
 log at '/var/log/messages':
 
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4971 blocked for more than 
 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G   --- H  
 2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D 0003 0  4971 
  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 88042b591c98 0082 
 81ed4ff0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 88042b591c68 810af370 
 88042b591ca0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 8803b4f01af8 88042b591fd8 
 fbc8 8803b4f01af8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ? 
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81077688] do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046] 
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [81066629] ? 
 wake_up_new_task+0xd9/0x130
 Sep 16 09:06:59 storage6 kernel: [81070ead] ? do_fork+0x13d/0x480
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ? sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80] 
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4972 blocked for more than 
 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G   --- H  
 2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D  0  4972 
  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc98 0082 
 81ed6d78 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc68 810af370 
 8803b4d7fca0 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4cf1ab8 8803b4d7ffd8 
 fbc8 8803b4cf1ab8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ? 
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81065e20] ? 
 wake_up_state+0x10/0x20
 Sep 16 09:06:59 storage6 kernel: [81077688] do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046] 
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [810097cc] ? 
 __switch_to+0x1ac/0x320
 Sep 16 09:06:59 storage6 kernel: [81527910] ? 
 thread_return+0x4e/0x76e
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ? sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80] 
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4973 blocked for more than 
 120 seconds.
 
 
 On Tue, Sep 16, 2014 at 9:00 AM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 15, 2014 at 5:55 PM, Yatong Zhang bluefl...@gmail.com wrote:
 I just encountered an error which left a log '/hs_err_pid3013.log'. So is 
 there a way to solve this?
 
 # There is insufficient memory for the Java Runtime Environment to continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes for 
 committing reserved memory.
 
 Use less heap memory?
 
 You haven't specified under which circumstances this occurred, so I can only 
 conjecture that it is likely being caused by writing too fast.
 
 Write more slowly.
 
 =Rob
 
 



Re: Trying to understand cassandra gc logs

2014-09-16 Thread Chris Lohfink
Check out:

https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs

The young gen collection is stop the world that pauses application threads, and 
a couple parts of CMS can as well.  I would recommend disabling the 

#JVM_OPTS=$JVM_OPTS -XX:PrintFLSStatistics=1

line in your cassandra-env.sh as well to simplify things a little and make it 
parsable by gc log visualization tools

---
Chris Lohfink

On Sep 15, 2014, at 9:40 PM, Donald Smith donald.sm...@audiencescience.com 
wrote:

 I understand that cassandra uses ParNew GC for New Gen and CMS for Old Gen 
 (tenured).   I’m trying to interpret in the logs when a Full GC happens and 
 what kind of Full GC is used.  It never says “Full GC” or anything like that. 
 But I see that whenever there’s a line like
  
 2014-09-15T18:04:17.197-0700: 117485.192: [CMS-concurrent-mark-start]
  
 the count of full GCs increases from
  
 {Heap after GC invocations=158459 (full 931):
  
 to a line like:
  
 {Heap before GC invocations=158459 (full 932):
  
 See the highlighted lines in the gclog output below.  So, apparently there 
 was a full GC between those two lines. Between those lines it also has two 
 lines, such as:
  
2014-09-15T18:04:17.197-0700: 117485.192: Total time for which application 
 threads were stopped: 0.0362080 seconds
2014-09-15T18:04:17.882-0700: 117485.877: Total time for which application 
 threads were stopped: 0.0129660 seconds
  
 Also, the full count (932 above) is always exactly half the number (1864) FGC 
 returned by jstat, as in
  
 dc1-cassandra01.dc01 /var/log/cassandra sudo jstat -gcutil 28511
   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
 55.82   0.00  82.45  45.02  59.76 165772 5129.728  1864  320.247 5449.975
  
 So, I am apparently correct that “(full 932)” is the count of Full GCs. I’m 
 perplexed by the log output, though.
  
 I also see lines mentioning “concurrent mark-sweep” that do not appear to 
 correspond to full GCs. So, my questions are:  Is CMS used also for full GCs? 
 If not, what kind of gc is done? The logs don’t say.Lines saying “Total 
 time for which application threads were stopped” appear twice per full gc; 
 why?  Apparently, even our Full GCs are fast. 99% of them finish within 0.18  
 seconds; 99.9% finish within 0.5 seconds (which may be too slow for some of 
 our clients).
  
 Here below is some log output, with interesting parts highlighted in grey or 
 yellow.  Thanks, Don
  
 {Heap before GC invocations=158458 (full 931):
 par new generation   total 1290240K, used 1213281K [0x0005bae0, 
 0x00061260, 0x00061260)
   eden space 1146880K, 100% used [0x0005bae0, 0x000600e0, 
 0x000600e0)
   from space 143360K,  46% used [0x000600e0, 0x000604ed87c0, 
 0x000609a0)
   to   space 143360K,   0% used [0x000609a0, 0x000609a0, 
 0x00061260)
 concurrent mark-sweep generation total 8003584K, used 5983572K 
 [0x00061260, 0x0007fae0, 0x0007fae0)
 concurrent-mark-sweep perm gen total 44820K, used 26890K [0x0007fae0, 
 0x0007fd9c5000, 0x0008)
 2014-09-15T18:04:17.131-0700: 117485.127: [GCBefore GC:
 Statistics for BinaryTreeDictionary:
 
 Total Free Space: 197474318
 Max   Chunk Size: 160662270
 Number of Blocks: 3095
 Av.  Block  Size: 63804
 Tree  Height: 32
 Before GC:
 Statistics for BinaryTreeDictionary:
 
 Total Free Space: 2285026
 Max   Chunk Size: 2279936
 Number of Blocks: 8
 Av.  Block  Size: 285628
 Tree  Height: 5
 2014-09-15T18:04:17.133-0700: 117485.128: [ParNew
 Desired survivor size 73400320 bytes, new threshold 1 (max 1)
 - age   1:   44548776 bytes,   44548776 total
 : 1213281K-49867K(1290240K), 0.0264540 secs] 
 7196854K-6059170K(9293824K)After GC:
 Statistics for BinaryTreeDictionary:
 
 Total Free Space: 195160244
 Max   Chunk Size: 160662270
 Number of Blocks: 3093
 Av.  Block  Size: 63097
 Tree  Height: 32
 After GC:
 Statistics for BinaryTreeDictionary:
 
 Total Free Space: 2285026
 Max   Chunk Size: 2279936
 Number of Blocks: 8
 Av.  Block  Size: 285628
 Tree  Height: 5
 , 0.0286700 secs] [Times: user=0.37 sys=0.01, real=0.03 secs]
 Heap after GC invocations=158459 (full 931):
 par new generation   total 1290240K, used 49867K [0x0005bae0, 
 0x00061260, 0x00061260)
   eden space 1146880K,   0% used [0x0005bae0, 0x0005bae0, 
 0x000600e0)
   from space 143360K,  34% used [0x000609a0, 0x00060cab2e18, 
 0x00061260)
   to   space 143360K,   0% used [0x000600e0, 0x000600e0, 
 0x000609a0)
 concurrent mark-sweep generation total 8003584K, used 6009302K 
 [0x00061260, 0x0007fae0, 0x0007fae0)
 concurrent-mark-sweep perm gen total 44820K, used 

Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
Is consistency level honored for batch statements?

If I have 100 insert/update statements in my batch and use LOCAL_QUORUM
consistency, will the control from coordinator return only after a local
quorum update has been done for all the 100 statements?

Or is it different ?

Thanks
Vish


Re: Consistency Level for Atomic Batches

2014-09-16 Thread Viswanathan Ramachandran
A follow up on the earlier question.

I meant to ask earlier if control returns to client after batch log is
written on coordinator irrespective of consistency level mentioned.

Also: will the coordinator attempt all statements one after the other, or
in parallel ?

Thanks


On Tue, Sep 16, 2014 at 8:00 AM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Is consistency level honored for batch statements?

 If I have 100 insert/update statements in my batch and use LOCAL_QUORUM
 consistency, will the control from coordinator return only after a local
 quorum update has been done for all the 100 statements?

 Or is it different ?

 Thanks
 Vish



Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Kevin Burton
Say I want to do a rolling restart of Cassandra…

I can’t just restart all of them because they need some time to gossip and
for that gossip to get to all nodes.

What is the best strategy for this.

It would be something like:

/etc/init.d/cassandra restart  wait-for-cassandra.sh

… or something along those lines.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Duncan Sands
Hi Kevin, if you are using the latest version of opscenter, then even the 
community (= free) edition can do a rolling restart of your cluster.  It's 
pretty convenient.


Ciao, Duncan.

On 16/09/14 19:44, Kevin Burton wrote:

Say I want to do a rolling restart of Cassandra…

I can’t just restart all of them because they need some time to gossip and for
that gossip to get to all nodes.

What is the best strategy for this.

It would be something like:

/etc/init.d/cassandra restart  wait-for-cassandra.sh

… or something along those lines.

--

Founder/CEO Spinn3r.com http://Spinn3r.com
Location: *San Francisco, CA*
blog:**http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com





Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
and an option of drain before stopping.


I haven't noticed if they do anything special with seeds.
(At least one seed needs to be running before you restart other nodes.)


I wondered the same thing as Kevin and came to these conclusions.

Fixing the startup script is non-trivial as far as startup scripts go.

For start, it would have to:

- parse cassandra.yaml for seeds
- if itself is not a seed, wait for a seed to start first. (could take minutes 
or never.)

- continue start.


For a no-downtime cluster restart script, it would have to:

- verify cluster health (ie. quorum/CL is met or you lose writes)

- parse cassandra.yaml for seeds and see if a seed is up
- stop gossip and thrift
- maybe do compaction before drain

- drain node
- stop/start or restart cassandra process.

http://comments.gmane.org/gmane.comp.db.cassandra.user/20144

Both of those scripts would be nice to have. :)

OpsCenter is flaky at doing rolling restart in my test cluster,
so an alternative is needed.

Also, the free OpsCenter doesn't have rolling repair option enabled.

ccm has the options to do drain, stop and start, but a bash
script would be needed to make it rolling.

https://github.com/pcmanus/ccm


Thanks, James. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.




 From: Duncan Sands duncan.sa...@gmail.com
To: user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 11:09 AM
Subject: Re: Blocking while a node finishes joining the cluster after restart.
 

Hi Kevin, if you are using the latest version of opscenter, then even the 
community (= free) edition can do a rolling restart of your cluster.  It's 
pretty convenient.

Ciao, Duncan.

On 16/09/14 19:44, Kevin Burton wrote:
 Say I want to do a rolling restart of Cassandra…

 I can’t just restart all of them because they need some time to gossip and for
 that gossip to get to all nodes.

 What is the best strategy for this.

 It would be something like:

 /etc/init.d/cassandra restart  wait-for-cassandra.sh

 … or something along those lines.

 --

 Founder/CEO Spinn3r.com http://Spinn3r.com

 Location: *San Francisco, CA*
 blog:**http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 12:21 PM, James Briggs james.bri...@yahoo.com
wrote:

 I haven't noticed if they do anything special with seeds.
 (At least one seed needs to be running before you restart other nodes.)


If the nodes have all seen each other before (the cluster has coalesced
once) then AFAIK this statement is not true. The ring state is persisted,
nodes don't need to talk to a seed to start.


 I wondered the same thing as Kevin and came to these conclusions.


As I don't think the seed node wrinkle exists, I'm pretty sure all you
really have to do is make sure the node is answering on the Thrift and
Gossip ports and that other nodes all see it as UP.

=Rob


Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
Hi Robert.

I just did a test (shutdown all nodes, start one non-seed node.)


You're correct that an old non-seed node can start by itself.

So startup scripts don't have to be intelligent, but apps need to wait

until there's enough nodes up to serve the whole keyspace:

cqlsh:my_keyspace consistency
Current consistency level is ONE.

cqlsh:my_keyspace select * from numbers where v=1;

 v
---
 1

(1 rows)

cqlsh:my_keyspace select * from numbers where v=2;
Unable to complete request: one or more nodes were unavailable.


Thanks, James. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 


backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
Hello,

Has anyone backported incremental replacement of compacted SSTables
(CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
introduced in 2.1?

Haven't checked the ticket detail yet, but just in case anyone has
interesting info to share.

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 Has anyone backported incremental replacement of compacted SSTables
 (CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
 introduced in 2.1?

 Haven't checked the ticket detail yet, but just in case anyone has
 interesting info to share.


Are you looking to patch for public consumption, or for your own purposes?

I just took the temperature of #cassandra-dev and they were cold on the
idea as a public patch, because of potential impact on stability.

=Rob


Re: backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
own purposes but wouldn't mind making it public so people could patch it
themselves if they want too.. (if nobody has already done so) :)

On Tue, Sep 16, 2014 at 8:13 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Has anyone backported incremental replacement of compacted SSTables
 (CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
 introduced in 2.1?

 Haven't checked the ticket detail yet, but just in case anyone has
 interesting info to share.


 Are you looking to patch for public consumption, or for your own purposes?

 I just took the temperature of #cassandra-dev and they were cold on the
 idea as a public patch, because of potential impact on stability.

 =Rob





-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: backport of CASSANDRA-6916

2014-09-16 Thread James Briggs
Paulo:

Out of curiosity, why not just upgrade to 2.1 if you want the new features?

You know you want to! :)

 
Thanks, James Briggs
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.



 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 4:13 PM
Subject: Re: backport of CASSANDRA-6916
 


On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

Has anyone backported incremental replacement of compacted SSTables 
(CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies introduced 
in 2.1?


Haven't checked the ticket detail yet, but just in case anyone has interesting 
info to share.

Are you looking to patch for public consumption, or for your own purposes?

I just took the temperature of #cassandra-dev and they were cold on the idea as 
a public patch, because of potential impact on stability.

=Rob

Re: backport of CASSANDRA-6916

2014-09-16 Thread Paulo Ricardo Motta Gomes
Because I want this specific feature, and not all 2.1 features, even though
this is probably one of the most significant changes in 2.1. Upgrading
would be nice, but want to wait a little more before fully jumping into 2.1
:)

We're having sudden peaks on read latency some time after a massive batch
write which is mostly likely caused by cold page cache of newly compacted
sstables, which will hopefully be solved by this.

On Tue, Sep 16, 2014 at 8:25 PM, James Briggs james.bri...@yahoo.com
wrote:

 Paulo:

 Out of curiosity, why not just upgrade to 2.1 if you want the new features?

 You know you want to! :)


 Thanks, James Briggs
 --
 Cassandra/MySQL DBA. Available in San Jose area or remote.


   --
  *From:* Robert Coli rc...@eventbrite.com
 *To:* user@cassandra.apache.org user@cassandra.apache.org
 *Sent:* Tuesday, September 16, 2014 4:13 PM
 *Subject:* Re: backport of CASSANDRA-6916

 On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Has anyone backported incremental replacement of compacted SSTables
 (CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies
 introduced in 2.1?

 Haven't checked the ticket detail yet, but just in case anyone has
 interesting info to share.


 Are you looking to patch for public consumption, or for your own purposes?

 I just took the temperature of #cassandra-dev and they were cold on the
 idea as a public patch, because of potential impact on stability.

 =Rob






-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 We're having sudden peaks on read latency some time after a massive batch
 write which is mostly likely caused by cold page cache of newly compacted
 sstables, which will hopefully be solved by this.


populate_io_cache_on_flush ?

Note that this feature is sorta badly named, it includes flushing of
SSTables as part of compaction.

https://issues.apache.org/jira/browse/CASSANDRA-4694?focusedCommentId=13723129page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13723129

=Rob


Re: backport of CASSANDRA-6916

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 4:50 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Sep 16, 2014 at 4:38 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 We're having sudden peaks on read latency some time after a massive batch
 write which is mostly likely caused by cold page cache of newly compacted
 sstables, which will hopefully be solved by this.

 Note that this feature is sorta badly named, it includes flushing of
 SSTables as part of compaction.


Also note that it is removed as of 2.1.

https://issues.apache.org/jira/browse/CASSANDRA-7495

=Rob


no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Hi -

We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances 
were using EBS for storage (I know it is not recommended). We replaced the EBS 
storage with SSDs. However, we didn't see any change in read latency. A query 
that took 10 seconds when data was stored on EBS still takes 10 seconds even 
after we moved the data directory to SSD. It is a large query returning 200,000 
CQL rows from a single partition. We are reading 3 columns from each row and 
the combined data in these three columns for each row is around 100 bytes. In 
other words, the raw data returned by the query is approximately 20MB.

I was expecting at least 5-10 times reduction in read latency going from EBS to 
SSD, so I am puzzled why we are not seeing any change in performance.

Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?

Thanks,
Mohammed



Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Robert Coli
On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com
wrote:

 Does anyone have insight as to why we don't see any performance impact on
 the reads going from EBS to SSD?


What does it say when you enable tracing on this CQL query?

10 seconds is a really long time to access anything in Cassandra. There is,
generally speaking, a reason why the default timeouts are lower than this.

My conjecture is that the data in question was previously being served from
the page cache and is now being served from SSD. You have, in switching
from EBS-plus-page-cache to SSD successfully proved that SSD and RAM are
both very fast. There is also a strong suggestion that whatever access
pattern you are using is not bounded by disk performance.

=Rob


Announce: top for Cassandra - cass_top

2014-09-16 Thread James Briggs
I wrote cass_top, a poor man's version of OpsCenter, in bash (no dependencies.)


http://www.jebriggs.com/blog/2014/09/top-utility-for-cassandra-clusters-cass_top/

 
Actually, if it had node or cluster restart, it would do most of what the 
OpsCenter free version does. :)

The features of cass_top are:

- colorizes nodetool status output: UN nodes green, DN nodes red, other 
statuses blue
- no extra firewall holes needed (agent-less and server-less), unlike OpsCenter
- fast initial startup time (under 2 seconds), unlike OpsCenter
- uses bash, so no programming environment needed - run it anywhere nodetool 
works
- uses minimal screen real estate, so several rings can fit on one monitor
- free (Apache 2).

Please send me your comments and suggestions. The top-like infinite loop is
actually a read loop, so adding a few more features like cfstats or flush would 
be easy.

Enjoy, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread James Briggs
To expand on what Robert said, Cassandra is a log-structured database:

- writes are append operations, so both correctly configured disk volumes and 
SSD are fast at that

- reads could be helped by SSD if they're not in cache (ie. on disk)

- but compaction is definitely helped by SSD with large data loads (compaction 
is the trade-off for fast writes)

 
Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 
Mailbox dimensions: 10x12x14



 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 5:42 PM
Subject: Re: no change observed in read latency after switching from EBS to SSD 
storage
 





On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com wrote:

Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?


What does it say when you enable tracing on this CQL query?

10 seconds is a really long time to access anything in Cassandra. There is, 
generally speaking, a reason why the default timeouts are lower than this.

My conjecture is that the data in question was previously being served from the 
page cache and is now being served from SSD. You have, in switching from 
EBS-plus-page-cache to SSD successfully proved that SSD and RAM are both very 
fast. There is also a strong suggestion that whatever access pattern you are 
using is not bounded by disk performance.

=Rob

Re: hs_err_pid3013.log, out of memory?

2014-09-16 Thread J. Ryan Earl
Are you using JNA?  Did you adjust your memlock limit?

On Tue, Sep 16, 2014 at 9:46 AM, Chris Lohfink clohf...@blackbirdit.com
wrote:

 How much memory does your system have? How much memory is system utilizing
 before starting Cassandra (use command free)? What are the heap setting it
 tries to use?

 Chris

 On Sep 15, 2014, at 8:16 PM, Yatong Zhang bluefl...@gmail.com wrote:

 It's during the startup. I tried to upgrade cassandra from 2.0.7 to
 2.0.10, but looks like cassandra could not start again. Also I found the
 following log at '/var/log/messages':

 Sep 16 09:06:59 storage6 kernel: INFO: task java:4971 blocked for more
 than 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G
 --- H  2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0 
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D 0003 0
 4971  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 88042b591c98 0082
 81ed4ff0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 88042b591c68 810af370
 88042b591ca0 8803b4f01540
 Sep 16 09:06:59 storage6 kernel: 8803b4f01af8 88042b591fd8
 fbc8 8803b4f01af8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ?
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81077688]
 do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046]
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [81066629] ?
 wake_up_new_task+0xd9/0x130
 Sep 16 09:06:59 storage6 kernel: [81070ead] ?
 do_fork+0x13d/0x480
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ?
 sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80]
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4972 blocked for more
 than 120 seconds.
 Sep 16 09:06:59 storage6 kernel:  Tainted: G
 --- H  2.6.32-431.el6.x86_64 #1
 Sep 16 09:06:59 storage6 kernel: echo 0 
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Sep 16 09:06:59 storage6 kernel: java  D  0
 4972  1 0x0080
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc98 0082
 81ed6d78 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4d7fc68 810af370
 8803b4d7fca0 8803b4cf1500
 Sep 16 09:06:59 storage6 kernel: 8803b4cf1ab8 8803b4d7ffd8
 fbc8 8803b4cf1ab8
 Sep 16 09:06:59 storage6 kernel: Call Trace:
 Sep 16 09:06:59 storage6 kernel: [810af370] ?
 exit_robust_list+0x90/0x160
 Sep 16 09:06:59 storage6 kernel: [81076ad5] exit_mm+0x95/0x180
 Sep 16 09:06:59 storage6 kernel: [81076f1f] do_exit+0x15f/0x870
 Sep 16 09:06:59 storage6 kernel: [81065e20] ?
 wake_up_state+0x10/0x20
 Sep 16 09:06:59 storage6 kernel: [81077688]
 do_group_exit+0x58/0xd0
 Sep 16 09:06:59 storage6 kernel: [8108d046]
 get_signal_to_deliver+0x1f6/0x460
 Sep 16 09:06:59 storage6 kernel: [8100a265] do_signal+0x75/0x800
 Sep 16 09:06:59 storage6 kernel: [810097cc] ?
 __switch_to+0x1ac/0x320
 Sep 16 09:06:59 storage6 kernel: [81527910] ?
 thread_return+0x4e/0x76e
 Sep 16 09:06:59 storage6 kernel: [810b1c0b] ?
 sys_futex+0x7b/0x170
 Sep 16 09:06:59 storage6 kernel: [8100aa80]
 do_notify_resume+0x90/0xc0
 Sep 16 09:06:59 storage6 kernel: [8100b341] int_signal+0x12/0x17
 Sep 16 09:06:59 storage6 kernel: INFO: task java:4973 blocked for more
 than 120 seconds.



 On Tue, Sep 16, 2014 at 9:00 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 15, 2014 at 5:55 PM, Yatong Zhang bluefl...@gmail.com
 wrote:

 I just encountered an error which left a log '/hs_err_pid3013.log'. So
 is there a way to solve this?

 # There is insufficient memory for the Java Runtime Environment to
 continue.
 # Native memory allocation (malloc) failed to allocate 12288 bytes for
 committing reserved memory.


 Use less heap memory?

 You haven't specified under which circumstances this occurred, so I can
 only conjecture that it is likely being caused by writing too fast.

 Write more slowly.

 =Rob






Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Alex Kamil
Mohammed, to add to previous answers, EBS is network attached, with SSD or
without it , you access your disk via network constrained by network
bandwidth and latency, if you really need to improve IO performance try
switching to  ephemeral storage  (also called instance storage) which is
physically attached to EC2 instance, and is as good as native disk IO goes.

On Tue, Sep 16, 2014 at 11:39 PM, James Briggs james.bri...@yahoo.com
wrote:

 To expand on what Robert said, Cassandra is a log-structured database:

 - writes are append operations, so both correctly configured disk volumes
 and SSD are fast at that
 - reads could be helped by SSD if they're not in cache (ie. on disk)
 - but compaction is definitely helped by SSD with large data loads
 (compaction is the trade-off for fast writes)

 Thanks, James Briggs.
 --
 Cassandra/MySQL DBA. Available in San Jose area or remote.
 Mailbox dimensions: 10x12x14

   --
  *From:* Robert Coli rc...@eventbrite.com
 *To:* user@cassandra.apache.org user@cassandra.apache.org
 *Sent:* Tuesday, September 16, 2014 5:42 PM
 *Subject:* Re: no change observed in read latency after switching from
 EBS to SSD storage



 On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

 Does anyone have insight as to why we don't see any performance impact on
 the reads going from EBS to SSD?


 What does it say when you enable tracing on this CQL query?

 10 seconds is a really long time to access anything in Cassandra. There
 is, generally speaking, a reason why the default timeouts are lower than
 this.

 My conjecture is that the data in question was previously being served
 from the page cache and is now being served from SSD. You have, in
 switching from EBS-plus-page-cache to SSD successfully proved that SSD and
 RAM are both very fast. There is also a strong suggestion that whatever
 access pattern you are using is not bounded by disk performance.

 =Rob






Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Tony Anecito
If you cached your tables or the database you may not see any difference at all.
 
Regards,
-Tony  


On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller 
moham...@glassbeam.com wrote:
  


Hi -

We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances 
were using EBS for storage (I know it is not recommended). We replaced the EBS 
storage with SSDs. However, we didn't see any change in read latency. A query 
that took 10 seconds when data was stored on EBS still takes 10 seconds even 
after we moved the data directory to SSD. It is a large query returning 200,000 
CQL rows from a single partition. We are reading 3 columns from each row and 
the combined data in these three columns for each row is around 100 bytes. In 
other words, the raw data returned by the query is approximately 20MB.

I was expecting at least 5-10 times reduction in read latency going from EBS to 
SSD, so I am puzzled why we are not seeing any change in performance.

Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?

Thanks,
Mohammed

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Ben Bromhead
EBS vs local SSD in terms of latency you are using ms as your unit of
measurement.
If your query runs for 10s you will not notice anything. What is a few less
ms for the life of a 10 second query.

To reiterate what Rob said. The query is probably slow because of your use
case / data model, not the underlying disk.



On 17 September 2014 14:21, Tony Anecito adanec...@yahoo.com wrote:

 If you cached your tables or the database you may not see any difference
 at all.

 Regards,
 -Tony


   On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller 
 moham...@glassbeam.com wrote:


 Hi -

 We are running Cassandra 2.0.5 on AWS on m3.large instances. These
 instances were using EBS for storage (I know it is not recommended). We
 replaced the EBS storage with SSDs. However, we didn't see any change in
 read latency. A query that took 10 seconds when data was stored on EBS
 still takes 10 seconds even after we moved the data directory to SSD. It is
 a large query returning 200,000 CQL rows from a single partition. We are
 reading 3 columns from each row and the combined data in these three
 columns for each row is around 100 bytes. In other words, the raw data
 returned by the query is approximately 20MB.

 I was expecting at least 5-10 times reduction in read latency going from
 EBS to SSD, so I am puzzled why we are not seeing any change in performance.

 Does anyone have insight as to why we don't see any performance impact on
 the reads going from EBS to SSD?

 Thanks,
 Mohammed





-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | +61 415 936 359


RE: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread Mohammed Guller
Rob,
The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5 
seconds out of that was taken up by the “merge memtable and sstables” step. The 
remaining 5 seconds are from “read live and tombstoned cells.”

I too first thought that maybe disk is not the bottleneck and Cassandra is 
serving everything from cache, but in that case, it should not take 10 seconds 
for reading just 20MB data.

Also, I narrowed down the query to limit it to a single partition read and I 
ran the query in cqlsh running on the same node. I turned on tracing, which 
shows that all the steps got executed on the same node. htop shows that CPU and 
memory are not the bottlenecks. Network should not come into play since the 
cqlsh is running on the same node.

Is there any performance tuning parameter in the cassandra.yaml file for large 
reads?

Mohammed

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Tuesday, September 16, 2014 5:42 PM
To: user@cassandra.apache.org
Subject: Re: no change observed in read latency after switching from EBS to SSD 
storage

On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?

What does it say when you enable tracing on this CQL query?

10 seconds is a really long time to access anything in Cassandra. There is, 
generally speaking, a reason why the default timeouts are lower than this.

My conjecture is that the data in question was previously being served from the 
page cache and is now being served from SSD. You have, in switching from 
EBS-plus-page-cache to SSD successfully proved that SSD and RAM are both very 
fast. There is also a strong suggestion that whatever access pattern you are 
using is not bounded by disk performance.

=Rob



Re: C 2.1

2014-09-16 Thread Jack Krupansky
DSE/Solr is tightly integrated, so there is no “external” system to manage – 
insert data in CQL and within a few seconds it is available for query from Solr 
running in the same JVM as Cassandra. DSE/Solr indexes the data on each 
Cassandra node, and uses Cassandra’s cluster management for distributing 
queries across the cluster. And... Lucene (underneath Solr) is optimal for 
queries that span multiple fields. DSE/Solr supports CQL3 wide rows (clustering 
columns.)

-- Jack Krupansky

From: Ram N 
Sent: Monday, September 15, 2014 4:34 PM
To: user 
Subject: Re: C 2.1


Jack, 

Using Solr or an external search/indexing service is an option but increases 
the complexity of managing different systems. I am curious to understand the 
impact of having wide-rows on a separate CF for inverted index purpose which if 
I understand correctly is what Rob's response, having a separate CF for index 
is better than using the default Secondary index option. 

Would be great to understand the design decision to go with present 
implementation on Secondary Index when the alternative is better? Looking at 
JIRAs is still confusing to come up with the why :) 

--R 





On Mon, Sep 15, 2014 at 11:17 AM, Jack Krupansky j...@basetechnology.com 
wrote:

  If you’re indexing and querying on that many columns (dozens, or more than a 
handful), consider DSE/Solr, especially if you need to query on multiple 
columns in the same query.

  -- Jack Krupansky

  From: Robert Coli 
  Sent: Monday, September 15, 2014 11:07 AM
  To: user@cassandra.apache.org 
  Subject: Re: C 2.1

  On Sat, Sep 13, 2014 at 3:49 PM, Ram N yrami...@gmail.com wrote:

Is 2.1 a production ready release? 

  https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


 Datastax Java driver - I get too confused with CQL and the underlying 
storage model. I am also not clear on the indexing structure of columns. Does 
CQL indexes create a separate CF for the index table? How is it different from 
maintaining inverted index? Internally both are the same? Does cql stmt to 
create index, creates a separate CF and has an atomic way of updating/managing 
them? Which one is better to scale? (something like stargate-core or the ones 
done by usergrid? or the CQL approach?)

  New projects should use CQL. Access to underlying storage via Thrift is 
likely to eventually be removed from Cassandra.

On a separate note just curious if I have 1000's of columns in a given row 
and a fixed set of indexed column  (say 30 - 50 columns) which approach should 
I be taking? Will cassandra scale with these many indexed column? Are there any 
limits? How much of an impact do CQL indexes create on the system? I am also 
not sure if these use cases are the right choice for cassandra but would really 
appreciate any response on these. Thanks.

  Use of the Secondary Indexes feature is generally an anti-pattern in 
Cassandra. 30-50 indexed columns in a row sounds insane to me. However 30-50 
column families into which one manually denormalized does not sound too insane 
to me...

  =Rob
  http://twitter.com/rcolidba