[
https://issues.apache.org/jira/browse/ACCUMULO-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser updated ACCUMULO-2949:
---------------------------------
Description:
To ensure that WALs are not left in a dangling "open" state WRT replication,
the garbage collector scans the tablets and constructs a view of WALs that are
currently in use. It consults that view to determine which WALs can move to a
"closed" replication state.
This isn't entirely correct because a WAL can "come back" again after being
removed from a Tablet. Consider the following:
# Table has one tablet hosted on one tserver
# Tablet gets some mutations
# Tablet gets MinC
# Tablet removes WAL entry as part of MinC
# WAL is "closed" WRT replication
# Tablet receives more mutations, starts using the same WAL
There are a couple of ways that this could present itself, each of which would
result in re-replication of data we've potentially already sent once. On an
active system, I don't think this is of big concern, and we already don't
guarantee a "once and only once" replication contract so this isn't critical.
The combiner set on the replication table will also mitigate most of the
re-replication concerns as those records persist until the entire file is
replicated (which should outlast the use on the local system).
[~ecn] recommended that we could record a "closed" marker for a WAL as a part
of {{TabletServerLogger.close()}} which would prevent the need to "guess" at
when a WAL will no longer be used.
If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need
this implemented.
was:
To ensure that WALs are not left in a dangling "open" state WRT replication,
the garbage collector scans the tablets and constructs a view of WALs that are
currently in use. It consults that view to determine which WALs can move to a
"closed" replication state.
This isn't entirely correct because a WAL can "come back" again after being
removed from a Tablet. Consider the following:
# Table has one tablet hosted on one tserver
# Tablet gets some mutations
# Tablet gets MinC
# Tablet removes WAL entry as part of MinC
# WAL is "closed" WRT replication
# Tablet receives more mutations, starts using the same WAL
There are a couple of ways that this could present itself, each of which would
result in re-replication of data we've potentially already sent once. On an
active system, I don't think this is of big concern, and we already don't
guarantee a "once and only once" replication contract so this isn't critical.
If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need
this implemented.
> Write explicit "close" markers for WALs
> ---------------------------------------
>
> Key: ACCUMULO-2949
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2949
> Project: Accumulo
> Issue Type: Improvement
> Components: logger, replication
> Reporter: Josh Elser
> Assignee: Josh Elser
>
> To ensure that WALs are not left in a dangling "open" state WRT replication,
> the garbage collector scans the tablets and constructs a view of WALs that
> are currently in use. It consults that view to determine which WALs can move
> to a "closed" replication state.
> This isn't entirely correct because a WAL can "come back" again after being
> removed from a Tablet. Consider the following:
> # Table has one tablet hosted on one tserver
> # Tablet gets some mutations
> # Tablet gets MinC
> # Tablet removes WAL entry as part of MinC
> # WAL is "closed" WRT replication
> # Tablet receives more mutations, starts using the same WAL
> There are a couple of ways that this could present itself, each of which
> would result in re-replication of data we've potentially already sent once.
> On an active system, I don't think this is of big concern, and we already
> don't guarantee a "once and only once" replication contract so this isn't
> critical. The combiner set on the replication table will also mitigate most
> of the re-replication concerns as those records persist until the entire file
> is replicated (which should outlast the use on the local system).
> [~ecn] recommended that we could record a "closed" marker for a WAL as a part
> of {{TabletServerLogger.close()}} which would prevent the need to "guess" at
> when a WAL will no longer be used.
> If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will
> need this implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)