Re: Consistency Level One Question
Thanks, this clears things up. > On Feb 21, 2014, at 6:47 AM, Edward Capriolo wrote: > > When you write at one, as soon as one node acknowledges the write the ack is > returned to the client. This means if you quickly read from aome other node > 1)you may get the result because by the time the read is processed the data > may be on that node > 2)the node you read from may proxy the request to the node woth the data or > not > 3)you may get a column not found because the read might hit a node where the > data does not exist yet. > > Generally even at level one the replication is fast. I have done an > experiment on what you are asking. Write.one read from another as soon as > client gets an ack. Most of the time the data is replicated by the time the > second requeat is received. However "most of the time" is not a guarentee. If > the nodes are geographically separate who is to say if the firat request and > the second route around the internet a different way and the second action > arrives on a node before the first. That is eventual consistency for you. > > On Friday, February 21, 2014, graham sanderson wrote: > > My bad; should have checked the code: > > > > /** > > * This function executes local and remote reads, and blocks for the > > results: > > * > > * 1. Get the replica locations, sorted by response time according to > > the snitch > > * 2. Send a data request to the closest replica, and digest requests > > to either > > *a) all the replicas, if read repair is enabled > > *b) the closest R-1 replicas, where R is the number required to > > satisfy the ConsistencyLevel > > * 3. Wait for a response from R replicas > > * 4. If the digests (if any) match the data return the data > > * 5. else carry out read repair by getting data from all the nodes. > > */ > > > > On Feb 21, 2014, at 3:10 AM, Duncan Sands wrote: > > > >> Hi Graham, > >> > >> On 21/02/14 07:54, graham sanderson wrote: > >>> Note also; that reading at ONE there will be no read repair, since the > >>> coordinator does not know that another replica has stale data (remember > >>> at ONE, basically only one node is asked for the answer). > >> > >> I don't think this is right. My understanding is that while only one node > >> will be sent a direct read request, all other replicas will (not on every > >> query - it depends on the value of read_repair_chance) get a background > >> read repair request. You can test this experimentally using cqlsh and > >> turning tracing on: issue a read request many times. Most of the time you > >> will see that the coordinator sends a message to one node, but from time > >> to time (depending on read_repair_chance) you will see it sending messages > >> to many nodes. > >> > >> Best wishes, Duncan. > >> > >>> > >>> In practice for our use cases, we always write at LOCAL_QUORUM (failing > >>> the whole update if that doesn’t work - stale data is OK if >1 node is > >>> down), and we read at LOCAL_QUORUM, but (because stale data is better > >>> than no data), we will fall back per read request to LOCAL_ONE if we > >>> detect that there were insufficient nodes - this lets us cope with 2 down > >>> nodes in a 3 replica environment (or more if the nodes are not > >>> consecutive in the ring). > >>> > >>> On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: > >>> > Hi Guys, > > I wanted to get some clarification on what happens when you write and > read at consistency level 1. Say I have a keyspace with replication > factor of 3 and a table which will contain write-once/read-only wide > rows. If I write at consistency level 1 and the write happens on node A > and I read back at consistency level 1 from another node other than A, > say B, will C* return “not found” or will it trigger a read-repair > before responding? In addition, what’s the best consistency level for > reading/writing write-once/read-only wide rows? > > Thanks, > > Drew > > >>> > >> > > > > > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual.
Re: Consistency Level One Question
When you write at one, as soon as one node acknowledges the write the ack is returned to the client. This means if you quickly read from aome other node 1)you may get the result because by the time the read is processed the data may be on that node 2)the node you read from may proxy the request to the node woth the data or not 3)you may get a column not found because the read might hit a node where the data does not exist yet. Generally even at level one the replication is fast. I have done an experiment on what you are asking. Write.one read from another as soon as client gets an ack. Most of the time the data is replicated by the time the second requeat is received. However "most of the time" is not a guarentee. If the nodes are geographically separate who is to say if the firat request and the second route around the internet a different way and the second action arrives on a node before the first. That is eventual consistency for you. On Friday, February 21, 2014, graham sanderson wrote: > My bad; should have checked the code: > > /** > * This function executes local and remote reads, and blocks for the results: > * > * 1. Get the replica locations, sorted by response time according to the snitch > * 2. Send a data request to the closest replica, and digest requests to either > *a) all the replicas, if read repair is enabled > *b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel > * 3. Wait for a response from R replicas > * 4. If the digests (if any) match the data return the data > * 5. else carry out read repair by getting data from all the nodes. > */ > > On Feb 21, 2014, at 3:10 AM, Duncan Sands wrote: > >> Hi Graham, >> >> On 21/02/14 07:54, graham sanderson wrote: >>> Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). >> >> I don't think this is right. My understanding is that while only one node will be sent a direct read request, all other replicas will (not on every query - it depends on the value of read_repair_chance) get a background read repair request. You can test this experimentally using cqlsh and turning tracing on: issue a read request many times. Most of the time you will see that the coordinator sends a message to one node, but from time to time (depending on read_repair_chance) you will see it sending messages to many nodes. >> >> Best wishes, Duncan. >> >>> >>> In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn't work - stale data is OK if >1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). >>> >>> On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: >>> Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I read back at consistency level 1 from another node other than A, say B, will C* return "not found" or will it trigger a read-repair before responding? In addition, what's the best consistency level for reading/writing write-once/read-only wide rows? Thanks, Drew >>> >> > > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Consistency Level One Question
My bad; should have checked the code: /** * This function executes local and remote reads, and blocks for the results: * * 1. Get the replica locations, sorted by response time according to the snitch * 2. Send a data request to the closest replica, and digest requests to either *a) all the replicas, if read repair is enabled *b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel * 3. Wait for a response from R replicas * 4. If the digests (if any) match the data return the data * 5. else carry out read repair by getting data from all the nodes. */ On Feb 21, 2014, at 3:10 AM, Duncan Sands wrote: > Hi Graham, > > On 21/02/14 07:54, graham sanderson wrote: >> Note also; that reading at ONE there will be no read repair, since the >> coordinator does not know that another replica has stale data (remember at >> ONE, basically only one node is asked for the answer). > > I don't think this is right. My understanding is that while only one node > will be sent a direct read request, all other replicas will (not on every > query - it depends on the value of read_repair_chance) get a background read > repair request. You can test this experimentally using cqlsh and turning > tracing on: issue a read request many times. Most of the time you will see > that the coordinator sends a message to one node, but from time to time > (depending on read_repair_chance) you will see it sending messages to many > nodes. > > Best wishes, Duncan. > >> >> In practice for our use cases, we always write at LOCAL_QUORUM (failing the >> whole update if that doesn’t work - stale data is OK if >1 node is down), >> and we read at LOCAL_QUORUM, but (because stale data is better than no >> data), we will fall back per read request to LOCAL_ONE if we detect that >> there were insufficient nodes - this lets us cope with 2 down nodes in a 3 >> replica environment (or more if the nodes are not consecutive in the ring). >> >> On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: >> >>> Hi Guys, >>> >>> I wanted to get some clarification on what happens when you write and read >>> at consistency level 1. Say I have a keyspace with replication factor of 3 >>> and a table which will contain write-once/read-only wide rows. If I write >>> at consistency level 1 and the write happens on node A and I read back at >>> consistency level 1 from another node other than A, say B, will C* return >>> “not found” or will it trigger a read-repair before responding? In >>> addition, what’s the best consistency level for reading/writing >>> write-once/read-only wide rows? >>> >>> Thanks, >>> >>> Drew >>> >> > smime.p7s Description: S/MIME cryptographic signature
Re: Consistency Level One Question
Hi Graham, On 21/02/14 07:54, graham sanderson wrote: Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). I don't think this is right. My understanding is that while only one node will be sent a direct read request, all other replicas will (not on every query - it depends on the value of read_repair_chance) get a background read repair request. You can test this experimentally using cqlsh and turning tracing on: issue a read request many times. Most of the time you will see that the coordinator sends a message to one node, but from time to time (depending on read_repair_chance) you will see it sending messages to many nodes. Best wishes, Duncan. In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn’t work - stale data is OK if >1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I read back at consistency level 1 from another node other than A, say B, will C* return “not found” or will it trigger a read-repair before responding? In addition, what’s the best consistency level for reading/writing write-once/read-only wide rows? Thanks, Drew
Re: Consistency Level One Question
Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn’t work - stale data is OK if >1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: > Hi Guys, > > I wanted to get some clarification on what happens when you write and read at > consistency level 1. Say I have a keyspace with replication factor of 3 and a > table which will contain write-once/read-only wide rows. If I write at > consistency level 1 and the write happens on node A and I read back at > consistency level 1 from another node other than A, say B, will C* return > “not found” or will it trigger a read-repair before responding? In addition, > what’s the best consistency level for reading/writing write-once/read-only > wide rows? > > Thanks, > > Drew > smime.p7s Description: S/MIME cryptographic signature
Re: Consistency Level One Question
Writing at a consistency level of ONE means that your write will be acknowledged as soon as one replica confirms that it has made the write to memtable and the commit log (might not be quite synced to disk, but that’s a separate issue). All the writes are submitted in parallel, so it is very possible that the data will be on the other nodes very quickly Reading at ONE means that only one node will be asked for the data (unless you have rapid-read-protection AND the node you asked is very slow to respond). So writing/reading at ONE means that it is possible (depending on how long you wait and a bunch of other factors) that the read - if it goes to another replica - *may* not return the data. The safest thing to do is QUORUM writes and reads - this way the write only is acknowledged when 2 of the 3 replicas have confirmed the data is written; subsequently your read will go to at least 2 nodes, at least one of which must therefore have the latest data, and the read command returns the most up to date data amongst the responding nodes. On Feb 20, 2014, at 11:21 PM, Drew Kutcharian wrote: > Hi Guys, > > I wanted to get some clarification on what happens when you write and read at > consistency level 1. Say I have a keyspace with replication factor of 3 and a > table which will contain write-once/read-only wide rows. If I write at > consistency level 1 and the write happens on node A and I read back at > consistency level 1 from another node other than A, say B, will C* return > “not found” or will it trigger a read-repair before responding? In addition, > what’s the best consistency level for reading/writing write-once/read-only > wide rows? > > Thanks, > > Drew > smime.p7s Description: S/MIME cryptographic signature