[ 
https://issues.apache.org/jira/browse/ACCUMULO-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-2819:
---------------------------------

    Component/s: replication

> Provide WorkAssigner which is order-aware
> -----------------------------------------
>
>                 Key: ACCUMULO-2819
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2819
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: replication
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> The current WorkAssigner implementation, which uses the DistributedWorkQueue, 
> is great because it allows the Master to be unaware of what tservers are 
> available, and to allow any tserver to perform the replication.
> The downside of this is that it is possible to replicate data that was 
> ingested later before the earlier ingested data. For example, say {{table1}} 
> uses {{wal1}} to ingest some data. We record that {{wal1}} has some 
> replication to do, but, for whatever reason, we don't get to it. More data is 
> ingested into {{table1}}, and it starts using {{wal2}} after enough data was 
> ingested. Now, we have {{wal1}} and {{wal2}} which both have data to be 
> replicated for {{table1}}.
> Using the DistributedWorkQueue, we have no guarantee that {{wal1}} will be 
> replicated before {{wal2}}, which means we might replay a column update for 
> the same row in the wrong order (update from {{wal2}} and then update from 
> {{wal1}}).
> While the DistributedWorkQueue is nice for the mentioned reason, in addition 
> to the higher throughput, it has obvious deficiencies depending on the 
> workload and table schema. We need to create a WorkAssigner that is order 
> aware (what was the order in which the WALs for a table were minor compacted, 
> and ensure that replication occurs in that same order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to