query tracing

2014-11-07 Thread Jimmy Lin
is there any significant  performance penalty if one turn on Cassandra
query tracing, through DataStax java driver (say, per every query request
of some trouble query)?

More sampling seems better but then doing so may also slow down the system
in some other ways?

thanks


Re: query tracing

2014-11-07 Thread Robert Coli
On Fri, Nov 7, 2014 at 9:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 is there any significant  performance penalty if one turn on Cassandra
 query tracing, through DataStax java driver (say, per every query request
 of some trouble query)?


What does 'significant' mean in your sentence? I'm pretty sure the answer
for most meanings of it is no.

=Rob


Re: query tracing

2014-11-07 Thread Chris Lohfink
It saves a lot of information for each request thats traced so there is
significant overhead.  If you start at a low probability and move it up
based on the load impact it will provide a lot of insight and you can
control the cost.

---
Chris Lohfink

On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 is there any significant  performance penalty if one turn on Cassandra
 query tracing, through DataStax java driver (say, per every query request
 of some trouble query)?

 More sampling seems better but then doing so may also slow down the system
 in some other ways?

 thanks





Re: query tracing

2014-11-07 Thread Jonathan Haddad
Personally I've found that using query timing + log aggregation on the
client side is more effective than trying to mess with tracing probability
in order to find a single query which has recently become a problem.  I
recommend wrapping your session with something that can automatically log
the statement on a slow query, then use tracing to identify exactly what
happened.  This way finding your problem is not a matter of chance.



On Fri Nov 07 2014 at 9:41:38 AM Chris Lohfink clohfin...@gmail.com wrote:

 It saves a lot of information for each request thats traced so there is
 significant overhead.  If you start at a low probability and move it up
 based on the load impact it will provide a lot of insight and you can
 control the cost.

 ---
 Chris Lohfink

 On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 is there any significant  performance penalty if one turn on Cassandra
 query tracing, through DataStax java driver (say, per every query request
 of some trouble query)?

 More sampling seems better but then doing so may also slow down the
 system in some other ways?

 thanks






Redundancy inside a cassandra node

2014-11-07 Thread Jabbar Azam
Hello all,

My work will be deploying a cassandra cluster next year. Due to internal
wrangling we can't seem to agree on the hardware. The software hasn't been
finished, but management are asking for a ballpark figure for the hardware
costs.

The problem is the IT team are saying the nodes need to have multiple
points of redundancy

e.g. dual power supplies, dual nics, SSD's configured in raid 10.


The software team is saying that due to cassandras resilient nature, due to
the way data is distributed and scalability that lots of cheap boes should
be used. So they have been taling about self build consumer grade boxes
with single nics, PSU's single SSDs etc.

Obviously the self build boxes will cost a fraction of the price, but each
box is not as resilient as the first option.

We don;t use any cloud technologies, so that's out of the question.

My question is what do people use in the real world in terms of node
resiliancy when running a cassandra cluster?

Write now the team is only thinking of hosting cassandra on the nodes. I'll
see if I can twist their arms and see the light with Apache Spark.

Obviously there are other tiers of servers, but they won't be running
cassandra.





Thanks

Jabbar Azam


Re: Redundancy inside a cassandra node

2014-11-07 Thread Plotnik, Alexey
Cassandra is a cluster itself, it's not necessary to have redundant each node. 
Cassandra has replication for that. And also Cassandra is designed to run in 
multiple data center - am think that redundant policy is applicable for you. 
Only thing from your saying you can deploy is raid10, other don't make any 
sense. As you are in stage of designing you cluster, please provide some 
numbers: how many data will be stored on each node, how many nodes would you 
have? What type of data will be stored in cluster: binary object o something 
time series?

Cassandra is designed to run on commodity hardware.

Отправлено с iPad

 8 нояб. 2014 г., в 6:26, Jabbar Azam aja...@gmail.com написал(а):
 
 Hello all,
 
 My work will be deploying a cassandra cluster next year. Due to internal 
 wrangling we can't seem to agree on the hardware. The software hasn't been 
 finished, but management are asking for a ballpark figure for the hardware 
 costs.
 
 The problem is the IT team are saying the nodes need to have multiple points 
 of redundancy 
 
 e.g. dual power supplies, dual nics, SSD's configured in raid 10.
 
 
 The software team is saying that due to cassandras resilient nature, due to 
 the way data is distributed and scalability that lots of cheap boes should be 
 used. So they have been taling about self build consumer grade boxes with 
 single nics, PSU's single SSDs etc.
 
 Obviously the self build boxes will cost a fraction of the price, but each 
 box is not as resilient as the first option.
 
 We don;t use any cloud technologies, so that's out of the question.
 
 My question is what do people use in the real world in terms of node 
 resiliancy when running a cassandra cluster?
 
 Write now the team is only thinking of hosting cassandra on the nodes. I'll 
 see if I can twist their arms and see the light with Apache Spark.
 
 Obviously there are other tiers of servers, but they won't be running 
 cassandra.
 
 
 
 
 
 Thanks
 
 Jabbar Azam