query tracing
is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? More sampling seems better but then doing so may also slow down the system in some other ways? thanks
Re: query tracing
On Fri, Nov 7, 2014 at 9:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? What does 'significant' mean in your sentence? I'm pretty sure the answer for most meanings of it is no. =Rob
Re: query tracing
It saves a lot of information for each request thats traced so there is significant overhead. If you start at a low probability and move it up based on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? More sampling seems better but then doing so may also slow down the system in some other ways? thanks
Re: query tracing
Personally I've found that using query timing + log aggregation on the client side is more effective than trying to mess with tracing probability in order to find a single query which has recently become a problem. I recommend wrapping your session with something that can automatically log the statement on a slow query, then use tracing to identify exactly what happened. This way finding your problem is not a matter of chance. On Fri Nov 07 2014 at 9:41:38 AM Chris Lohfink clohfin...@gmail.com wrote: It saves a lot of information for each request thats traced so there is significant overhead. If you start at a low probability and move it up based on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? More sampling seems better but then doing so may also slow down the system in some other ways? thanks
Redundancy inside a cassandra node
Hello all, My work will be deploying a cassandra cluster next year. Due to internal wrangling we can't seem to agree on the hardware. The software hasn't been finished, but management are asking for a ballpark figure for the hardware costs. The problem is the IT team are saying the nodes need to have multiple points of redundancy e.g. dual power supplies, dual nics, SSD's configured in raid 10. The software team is saying that due to cassandras resilient nature, due to the way data is distributed and scalability that lots of cheap boes should be used. So they have been taling about self build consumer grade boxes with single nics, PSU's single SSDs etc. Obviously the self build boxes will cost a fraction of the price, but each box is not as resilient as the first option. We don;t use any cloud technologies, so that's out of the question. My question is what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Write now the team is only thinking of hosting cassandra on the nodes. I'll see if I can twist their arms and see the light with Apache Spark. Obviously there are other tiers of servers, but they won't be running cassandra. Thanks Jabbar Azam
Re: Redundancy inside a cassandra node
Cassandra is a cluster itself, it's not necessary to have redundant each node. Cassandra has replication for that. And also Cassandra is designed to run in multiple data center - am think that redundant policy is applicable for you. Only thing from your saying you can deploy is raid10, other don't make any sense. As you are in stage of designing you cluster, please provide some numbers: how many data will be stored on each node, how many nodes would you have? What type of data will be stored in cluster: binary object o something time series? Cassandra is designed to run on commodity hardware. Отправлено с iPad 8 нояб. 2014 г., в 6:26, Jabbar Azam aja...@gmail.com написал(а): Hello all, My work will be deploying a cassandra cluster next year. Due to internal wrangling we can't seem to agree on the hardware. The software hasn't been finished, but management are asking for a ballpark figure for the hardware costs. The problem is the IT team are saying the nodes need to have multiple points of redundancy e.g. dual power supplies, dual nics, SSD's configured in raid 10. The software team is saying that due to cassandras resilient nature, due to the way data is distributed and scalability that lots of cheap boes should be used. So they have been taling about self build consumer grade boxes with single nics, PSU's single SSDs etc. Obviously the self build boxes will cost a fraction of the price, but each box is not as resilient as the first option. We don;t use any cloud technologies, so that's out of the question. My question is what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Write now the team is only thinking of hosting cassandra on the nodes. I'll see if I can twist their arms and see the light with Apache Spark. Obviously there are other tiers of servers, but they won't be running cassandra. Thanks Jabbar Azam