[
https://issues.apache.org/jira/browse/ACCUMULO-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser updated ACCUMULO-2819:
---------------------------------
Component/s: replication
> Provide WorkAssigner which is order-aware
> -----------------------------------------
>
> Key: ACCUMULO-2819
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2819
> Project: Accumulo
> Issue Type: Sub-task
> Components: replication
> Reporter: Josh Elser
> Assignee: Josh Elser
>
> The current WorkAssigner implementation, which uses the DistributedWorkQueue,
> is great because it allows the Master to be unaware of what tservers are
> available, and to allow any tserver to perform the replication.
> The downside of this is that it is possible to replicate data that was
> ingested later before the earlier ingested data. For example, say {{table1}}
> uses {{wal1}} to ingest some data. We record that {{wal1}} has some
> replication to do, but, for whatever reason, we don't get to it. More data is
> ingested into {{table1}}, and it starts using {{wal2}} after enough data was
> ingested. Now, we have {{wal1}} and {{wal2}} which both have data to be
> replicated for {{table1}}.
> Using the DistributedWorkQueue, we have no guarantee that {{wal1}} will be
> replicated before {{wal2}}, which means we might replay a column update for
> the same row in the wrong order (update from {{wal2}} and then update from
> {{wal1}}).
> While the DistributedWorkQueue is nice for the mentioned reason, in addition
> to the higher throughput, it has obvious deficiencies depending on the
> workload and table schema. We need to create a WorkAssigner that is order
> aware (what was the order in which the WALs for a table were minor compacted,
> and ensure that replication occurs in that same order.
--
This message was sent by Atlassian JIRA
(v6.2#6252)