[jira] [Commented] (CASSANDRA-14406) Transient Replication: Implement cheap quorum write optimizations

Benedict (JIRA) Tue, 04 Sep 2018 04:52:10 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602945#comment-16602945
 ]


Benedict commented on CASSANDRA-14406:
--------------------------------------

Fixes:
# StorageProxy.mutate would have attempted a standardWritePerformer 
maybeTryAdditionalReplicas for counters     
# assureSufficientLiveReplicas and blockFor were not transient replication (or 
pending replicas) aware
#* in the case of transient replication, this would mean we did not send enough 
initial writes, because we capped ourselves to blockFor recipients
# AbstractWriteResponseHandler
#* was sending to all remaining replicas in case of failure to meet 
consistency, not only those relevant for consistency
#* hasTransientResponse was racy - could have a transient response arrive after 
checking condition
#** Have introduced {{Accumulator.snapshot}} to make working with it safely 
more obvious        
#** We take a snapshot, and look inside the list to decide if we have a 
transient response
# sendMessagesToNonLocalDC was asserting no transient replicas - simply removed 
the assertions, as logic is consistent
# Hints were not implemented, but mostly involved filtering them out; batch log 
will be less trivial when implemented, as currently must hint
# Introduced separate threshold for cheap quorum upgrades
# There was a rare possible race condition when removing transient replication 
from a keyspace, during which period we would not handle transient replicas 
correctly

Nits:
# StorageProxy.mutate used a HashMap, when a List would suffice

Follow-ups pre-4.0:
# We should rename speculative_write_threshold (I thought we agreed on 
transient_write_threshold)?

Follow-ups:
# EACH_QUORUM not implemented for transient replication; must either error or 
implement before release
# we don’t limit our cheap quorum upgrade to the minimum number of additional 
transient replicas, so a single missing response will result in all DCs 
receiving an extra full write mutation, doubling cross-dc traffic for that write
# maybeTryAdditionalReplicas / sendMessagesToNonLocalDC are not DC aware in 
their interactions, so transient writes incur more cross-DC traffic (ideally, 
the proxies would be able to coordinate upgrading to a transient write) 
# we don’t expose metrics around success/failure of cheap quorum
# transient write count isn’t incremented when we perform a non-additional 
write (i.e. due to down full node)

> Transient Replication: Implement cheap quorum write optimizations
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-14406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14406
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Coordination
>            Reporter: Ariel Weisberg
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 4.0
>
>
> Writes should never be sent to transient replicas unless necessary to satisfy 
> the requested consistency level. Such as RF not being sufficient for strong 
> consistency or not enough full replicas marked as alive.
> If a write doesn't receive sufficient responses in time additional replicas 
> should be sent the write similar to Rapid Read Protection.
> Hints should never be written for a transient replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-14406) Transient Replication: Implement cheap quorum write optimizations

Reply via email to