C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiy
iently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.r
different discussion. More of a
monologue with an idiot in Finance, but
*.*
*Arthur C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*
*Daemeon Reiydelle*
*email: daeme...@gm
the % numbers seen high for a clean network and a reasonable fast client.
The 5% really not reasonable. No jumbo frames? No network retries
(netstats)?
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*
*"Why is it so hard to
Maybe SSD's? Take a look at the IO read/write wait times.
FYI, your config changes simply push more activity into memory. Trading IO
for mem footprint ;{)
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*
Cognitive Bias: (written
runs through 8.4 for sure.
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*
*“*“I have a different idea of elegance. I don't dres
You may want to think about the latency impacts of a cluster that has one
node "far away". This is such a basic design flaw that you need to do some
basic learning, and some basic understanding of networking and latency.
On Mon, Jul 19, 2021 at 10:38 AM MyWorld wrote:
> Hi all,
>
>
that you have 3
vm's on THREE SEPARATE physical systems and WITHOUT network attached storage
...
*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.rei
If you can handle the slower IO of S3 this can work, but you will have a
window of out of date images. YOu don't have a concept of persistent
snapshots.
<==>
Life lived is not about the size of the dog in the fight:
It is about the size of the fight in the dog.
*Daemeon Reiydelle*
Sounds VERY interesting! If the resume passes the BS sniff test (I do big
data, which has included C* for a NUMBER of years), I would love to chat.
FYI I do a fair amount of readiness assessments, before, during (with
laughable results), and now/after my tenure at Accenture/Avanade.
Cheers, D.
aa-d585-38e0-a72b-b36ce82da9cb, r
> emote=cdbb639b-1675-31b3-8a0d-84aca18e86bf
>
> i tried running some tcpdump during that time i dont see any packet loss
> during that time. still unsure why east instance which was stopped and
> started unreachable to west node almost for 15 minute
10 minutes is 600 seconds, and there are several timeouts that are set to
that, including the data center timeout as I recall.
You may be forced to tcpdump the interface(s) to see where the chatter is.
Out of curiosity, when you restart the node, have you snapped the jvm's
memory to see if e.g.
pretty clear evidence of a memory leak, tombstone problem (still memory),
etc.
If this is Apache, then you may need to do some heap dumps and see what is
going on (if it is java heap that is OOM'ing, which I suspect. Might want
to do some periodic vmstat or equivalent (brute force might be screen
Welcome to the world of testing predictive analytics. I will pass this on
to my folks at Accenture, know of a couple of C* clients we run, wondering
what you had in mind?
*Daemeon C.M. Reiydelle*
*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
Messenger can allow for some losses in degenerate infra cases, given a
given infra footprint. Also some ability to handle scale up faster as
demand increases, peak loads, etc. It therefore becomes a use case specific
optimization. Also hBase can run in Hadoop more easily, leveraging blobs
(HDFS),
you have to explain what you mean by "JBOD". All in one large vdisk?
Separate drives?
At the end of the day, if a device fails in a way that the data housed on
that device (or array) is no longer available, that HDFS storage is marked
down. HDFS now needs to create a 3rd replicant. Various timers
I'd like to split your question into two parts.
Part one is around recovery. If you lose a copy of the underlying data
because a note fails and let's assume you have three copies, how long can
you tolerate the time to restore the third copy?
The second question is about the absolute length of a
If you are starting with a modest amount of data (e.g. under .25 PB) and do
not have extremely high availability requirements, then it is easier to
start with MongoDB, avoiding HA clusters. I would suggest you start with
MongoDB. Both are great, but C* scales far beyond MongoDB FOR A GIVEN LEVEL
If ACID is needed, then C* is the wrong architecture. Your architecture
needs to match to your business processes as Ben pointed out: "Ask if it’s
really needed"
There is a concept of a velocity file (modern tech is memSQL'ish) that
delivers the high performance, acid transactions of lambda
Look for errors on your network interface. I think you have periodic errors
in your network connectivity
<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin
*Daemeon C.M. ReiydelleSan
Docker will provide less per node overhead.
And yes, virtualizing smaller nodes out of a bigger physical makes sense.
Of course you lose the per node failure protection, but I guess this is not
production?
<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid
Terraform plus ansible. Put ok but messy. 5-30,000 nodes and infra
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198
On Thu, Feb 8, 2018, 15:57 Ben Wood wrote:
> Shameless plug of our (DC/OS) Apache Cassandra service:
>
Good luck with that. Pcid out since mid 2017 as I recall?
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198
On Jan 9, 2018 10:31 AM, "Dor Laor" wrote:
Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller
On Tue, Jan 9, 2018 at
What specifically are you looking to monitor? As per above, Datadog has
superb components for monitoring, and no need do develop and support
anything, for a price of course. I have found management sometimes sees
devops resources as pretty low cost (pay for 40, get 70 hours work per
week). Depends
recall that a delete is actually a corner case of an update, as is an
insert.
As I read the snippet, you are updating multiple tables. The partition key
is table specific, so two sets of update batches are handled here.
We like to say that we don’t get to choose our parents, that they were
given
Note to the AWS poster, you have some limited understanding of how disks
are presented to AWS compute nodes. As a result your post is not relevant,
and misleading.
When considering throughput, recall that disk IO is ideally parallel. While
C* handles IO across multiple devices nicely, the unit of
Ambari
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
*"It is better to be insulted with the truth than kissed with a lie”*
On Fri, Jun 16, 2017 at 6:01 AM, Ram Bhatia wrote:
> Hi
>
> May I know, if there a tool similar to Oracle
Some random thoughts; I would like to thank you for giving us an
interesting problem. Cassandra can get boring sometimes, it is too stable.
- Do you have a way to monitor the network traffic to see if it is
increasing between restarts or does it seem relatively flat?
- What activities are
of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*
On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:
> This isn't an HDFS mailing list.
>
> On Tue, May 30, 2017 at 2:14 PM daemeon r
; I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme..
1:36 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>
wrote:
> I don't believe incremental repair is enabled, I have never enabled it on
> the cluster, and unless it's the default then it is off. Also I don't see a
> setting in cassandra.yaml for it.
>
>
> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com> wro
status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux ter
When you say "the load rises ... ", could you clarify what you mean by
"load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
in neither case would that be relevant to transient or persisted disk. Am I
missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli
, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence
sent from my mobile
Daemeon Reiydelle
skype
What is restacking?
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they
wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On May 25, 2017 9:14 AM, "Jonathan H
. Lawrence
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On May 16, 2017 2:42 PM, "suraj pasuparthy" <suraj.pasupar...@gmail.com>
wrote:
> So i though the same,
> I see the data via the CQLSH in both the datacenters. consistency is set
>
May I inquire if your configuration is actually data center aware? Do you
understand the difference between LQ and replication?
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
*“All men dream, but not equally. Those who dream by night in the dusty
recesses of
.
“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence
sent from my mobile
Daemeon
are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On May 19, 2017 9:05 AM, "ZAIDI, ASAD A" <az1...@att.com> wrote:
> Hello Folks -
>
> I'm
with
open eyes, to make it possible.” — T.E. Lawrence
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On May 18, 2017 8:20 AM, "Chuck Reynolds" <creyno...@ancestry.com> wrote:
> I have a need to create another datacenter and upgrade my existing
>
972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
>
>
> On Tue, May 16, 2017 at 6:48 PM, daemeon reiydelle <
it possible.” — T.E. Lawrence
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On May 16, 2017 5:27 AM, "Shalom Sagges" <shal...@liveperson.com> wrote:
> Hi All,
>
> Hypothetically speaking, let's say I want to upgrade my Cassandra cluster,
>
cal storage volumes on each machine.
>
> On May 5, 2017, at 3:25 PM, daemeon reiydelle <daeme...@gmail.com> wrote:
>
> These numbers do not match e.g. AWS, so guessing you are using local
> storage?
>
>
> *...*
>
> *Making a billion dollar startup is easy: &quo
These numbers do not match e.g. AWS, so guessing you are using local
storage?
*...*
*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*
My compliments to all of you for being adults, excessively kind, and
definitely excessively nice.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Tue, May 2, 2017 at 5:08 PM, Steve Robenalt
wrote:
> Hi Roman,
>
> I'm assuming
Yes, you can use host names. That merely adds another level of
configuration. When using terraform, I often use node names like
and just use those. They are only routable within the
region/VPC but are in fact already in dns. You do have to watch out as if
you change the seeds (in tf) or the
Caps below for emphasis, not shouting ;{)
Seed nodes are IDENTICAL to all other node hdfs nodes or you will wish
otherwise. Folks get confused because of terminoligy. I refer to this stuff
as "the seed node service of a normal hdfs node". ANY HDFS NODE IS ABLE TO
ACT AS A SEED NODE BY DEFINITION.
Having done variants of this, I would suggest you bring up new nodes at
approximately the same Apache version as a separate data center, in your
same cluster. Replication strategy may need to be tweaked
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On
This would be normal if the switches are user to kernel mode (disk &
network IO are kernel mode activities). If your run queue (jobs waiting to
run) is much larger than the number of cores (just a swag but less than
2-3*# of cores), you might have other issues.
*...*
*Daemeon C.M.
What you are doing is correctly going to result in this, IF there is
substantial backlog/network/disk or whatever pressure.
What do you think will happen when you write with a replication factor
greater than consistency level of write? Perhaps your mental model of how
C* works needs work?
Possible areas to check:
- too few nodes (node overload) - you did not indicate either replication
factor, number of nodes. Assume nodes are *rather* full.
- network overload (check your TORS's errors, also the tcp stats on the
relevant nodes)
- look for stop the world garbage collection on
I would zero in on network throughput, especially interrack trunks
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On Mar 17, 2017 2:07 PM, "Roland Otta" <roland.o...@willhaben.at> wrote:
> hello,
>
> we are quite inexperienced wit
check for level 2 (stop the world) garbage collections.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds
wrote:
> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
>
queries are hitting the cluster at
peak?
If many clients, how do you balance the connection load or do you always
hit the same node?
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198
On Mar 16, 2017 3:25 PM, "srinivasarao daruna" <sree.srin...@gmail.com&
The discard due to oom is causing the zero returned. I would guess a cache
miss problem of some sort, but not sure. Are you using row, index, etc.
caches? Are you seeing the failed prep statement on random nodes (duh,
nodes that have the relevant data ranges)?
*...*
*Daemeon C.M.
Am I unreasonable in expecting a poster to have looked at the documentation
before posting? And that reposting the same query WITHOUT reading the
documents (when pointed out to them) when asked to do so is not
appropriate? Do we have a way to blackball such?
*...*
*Daemeon C.M.
I
find it helpful to read the manual first. After review, I would be happy
to answer specific questions.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Mon, Mar 13, 2017
Recall that garbage collection on a busy node can occur minutes or seconds
apart. Note that stop the world GC also happens as frequently as every
couple of minutes on every node. Remove that and do the simple arithmetic.
sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA
I guess it depends on the experience one has. This is a common process to
bring up, move, build full prod copies, etc.
What is outlined is pretty much exactly what I have done 20-50 times (too
many to remember).
FYI, some of this should be done with nodes DOWN.
*...*
*Daemeon C.M.
We did. Found that, even with (CentOS, Ubuntu both for application
compatibility reasons) that there is somewhat less IO and better CPU
throughput at the price point. At the time my optimization work for that
client ended, Amazon was looking at the IO issue, as perhaps the frame
configurations
your MMV. Think of that storage limit as fairly reasonable for active data
likely to tombstone. Add more for older/historic data. Then think about
time to recover a node.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Wed, Feb 8, 2017 at 2:14 PM, Ben
A bunch more welcome than here in the US, to our deep shame and foolishness.
Sadly while I am actually involved in this area, I am happy in San
Francisco. I would be interested in being part of a pro bono team should
that transpire.
Thanks, D.
*...*
*Daemeon C.M. ReiydelleUSA (+1)
This is not a bug, and in fact changing it would be a serious bug.
What it is is a wonderful case of bad coding: would one expect a
java/py/bash script that loops on a bunch of read/execut/update calls where
each iteration calls time to return the same exact time for the duration of
the execution
timeouts indicate network or equivalent throughput delays, from the
physical box's network card out and to the other dc's card. If you are
using VM's add that layer. Your network team needs to be looking for ANY
timeouts, retries, packets delivered in retry window > 0, etc. ANY value
other than
I don't know if my perspective on this will assist, so YMMV:
Summary
1. Nodetool repairs are required when a node has issues and can't get
its (e.g. hinted handoff) resync done: culprit: usually network, sometimes
container/vm, rarely disk.
2. Scripts to do partition range are a pain
xWell, I seem to recall that the private IP's are valid for communications
WITHIN one VPC. I assume you can log into one machine and ping (or ssh) the
others. If so, check that cassandra.yaml is not set to listen on 127.0.0.1
(localhost).
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
d
> the correct URL (journeymonitor.com
> :4000/tutorials/2016/02/29/cassandra-inner-workings-and-how-this-relates-to-performance/).
>
> Substantial feedback regarding the actual post still very much welcome.
>
> Regards,
> Manuel
>
> Am 09.07.2016 um 03:32 schrieb daemeon reiydelle &l
Localhost is a special network address that never leaves the operating
system. It only goes "half way" down the IP stack. Thanks for your efforts!
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Fri, Jul 8, 2016 at 5:53 PM, Joaquin Alzola
ick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daeme...@gm
ul 7, 2016 at 1:51 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> Assuming you meant 100k, that likely for something with 16mb of storage
>> (probably way small) where the data is more that 64k hence will not fit
>> into the row cache.
>>
>
Assuming you meant 100k, that likely for something with 16mb of storage
(probably way small) where the data is more that 64k hence will not fit
into the row cache.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang
Hmm. Would you mind looking at your network interface (appropriate netstat
commands). if I am right you will be seeing packet errors, drops, retries,
packet out of window receives, etc.
What you may be missing is that you reported zero DROPPED latency. Not mean
LATENCY. Check your netstats. ANY
Network issues. Could be jumbo frames not consistent or other.
sent from my mobile
sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 4, 2016 5:34 AM, "Paco Trujillo" wrote:
> Hi everyone
>
>
>
> We are having problems with
This traffic is not part of the anomalous
> traffic we're seeing above, since this one goes on port 80 and it's clearly
> visible with a separate bpf filter, and its magnitude is far lower than
> that anyway
>
> Thanks
>
> On Thu, Feb 25, 2016 at 9:03 PM, daemeon reiydelle <daem
watch (divided by two)
>
> So unfortunately I still don't have any ideas about what's going on and
> why I'm seeing 17 GB of internode traffic instead of ~ 5-6.
>
> On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> If read & writ
If read & write at quorum then you write 3 copies of the data then return
to the caller; when reading you read one copy (assume it is not on the
coordinator), and 1 digest (because read at quorum is 2, not 3).
When you insert, how many keyspaces get written to? (Are you using e.g.
inverted
Hmm. What are your processes when a node comes back after "a long offline"?
Long enough to take the node offline and do a repair? Run the risk of
serving stale data? Parallel repairs? ???
So, what sort of time frames are "a long time"?
*...*
*Daemeon C.M. ReiydelleUSA (+1)
If you can, do a few (short, maybe 10m records, delete the default schema
between executions) run of Cassandra Stress test against your production
cluster (replication=3, force quorum to 3). Look for latency max in the 10s
of SECONDS. If your devops team is running a monitoring tool that looks at
Cassandra nodes do not go down "for no reason". They are not stateless. I
would like to thank you for this marvelous example of a wonderful
antipattern. Absolutely fantastic.
Thank you! I am not being a satirical smartass. I sometimes am challenged
by clients in my presentations about sre best
FYI, my observations were with native, not thrift.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Fri, Feb 19, 2016 at 10:12 AM, Sotirios Delimanolis wrote:
> Does your cluster contain 24+ nodes or fewer?
>
> We did the same
May be unrelated, but I found highly variable latency (latency max) when on
the 2.1 code tree loading new data (and reading). Others found that G1 or
CMS do not make a difference. Some evidence that 8/12/16gb memory make no
difference. These were latencies in the 10-30 SECOND range. It did cause
Given you only have 16 columns vs. over 200 ... I would expect a
substantial improvement in writes, but not 5x.
Ditto reads. I would be interested to understand where that 5x comes from.
*...*
*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Thu, Feb 18, 2016
and worse case latencies (allowing
for gc times)?
Daemeon Reiydelle
On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
> You can try slightly lowering the bloom_filter_fp_chance on your table.
>
> Otherwise, it's possible that you're repeatedly queryin
I think the key to your problem might be around "we overwrite every value".
You are creating a large number of tombstones, forcing many reads to pull
current results. You would do well to rethink why you are having to to
overwrite values all the time under the same key. You would be better to
Might I suggest you START by using the default schema provided by
cassandra-stress. Using someone else's schema is great AFTER you use have
used a standard and generally well understood baseline.
>From that you can decide whether a 4 node x 2 cluster is right for you.
FYI, given your 6 way
What do the logs say on the seed node (and on the UJ node)?
Look for timeout messages.
This problem has occurred for me when there was high network utilization
between the seed and the joining node, also routing issues.
*...*
*“Life should not be a journey to the grave with the
> persisted Gossip state the seed nodes will again be needed to find the rest
>> of the cluster.
>>
>> I'm not sure whether a power outage is the same as stopping and
>> restarting an instance (AWS) in terms of whether the restarted instance
>> retains
The keys don't have to be on the box. You do need a logi/password for c*.
sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Jan 14, 2016 5:16 PM, "oleg yusim" wrote:
> Greetings,
>
> Guys, can you please help me to understand
This happens when there is insufficient time for nodes coming up to join a
network. It takes a few seconds for a node to come up, e.g. your seed node.
If you tell a node to join a cluster you can get this scenario because of
high network utilization as well. I wait 90 seconds after the first (i.e.
There is a window after a node goes down that changes that node should have
gotten will be kept. If the node is down LONGER than that, it will server
stale data. If the consistency is greater than two, its data will be
ignored (if consistency one, its data could be the first returned, if
Have you checked the network statistics on that machine? (netstats -tas)
while attempting to repair ... if netstats show ANY issues you have a
problem. If you can put the command in a loop running every 60 seconds for
maybe 15 minutes and post back?
Out of curiousity, how many remote DC nodes are
If one rethinks "consistency" to mean "copies returned" and "copies
written" then one can have different values for the former (datastax) and
the latter (within Cassandra). The latter changes eventual consistency
(e.g. two copies must be written), the former can speed up a result at the
(slight)
You appear to have multiple java binaries in your path. That needs to be
resolved.
sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 5, 2015 1:40 AM, Jean Tremblay jean.tremb...@zen-innovations.com
wrote:
Hi,
I have a cluster of 5 nodes. We use
, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*
On Thu, Apr 2, 2015 at 12:39 PM, Andrew Vant andrew.v...@rackspace.com
wrote:
On Mar 31, 2015, at 4:59 PM, daemeon reiydelle daeme...@gmail.com wrote:
What
Jack did a superb job of explaining all of your issues, and his last
sentence seems to fit your needs (and my experience) very well. The only
other point I would add is to ascertain if the use patterns commend
microservices to abstract from data locality, even if the initial
deployment is a noop
Do you happen to be using a tool like Nagios or Ganglia that are able to
report utilization (CPU, Load, disk io, network)? There are plugins for
both that will also notify you of (depending on whether you enabled the
intermediate GC logging) about what is happening.
On Thu, Apr 2, 2015 at 8:35
May not be relevant, but what is the default heap size you have deployed.
Should be no more than 16gb (and be aware of the impacts of gc on that
large size), suggest not smaller than 8-12gb.
On Wed, Apr 1, 2015 at 11:28 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:
Are you writing multiple
Interesting that you are finding excessive drift from public time servers.
I only once saw that problem with AWS' time servers. To be conservative I
sometimes recommend that clients spool up their own time server, but
realize IT will also drift if the public time servers do! Somewhat
different if
1 - 100 of 109 matches
Mail list logo