[
https://issues.apache.org/jira/browse/ACCUMULO-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038226#comment-14038226
]
ASF subversion and git services commented on ACCUMULO-2915:
-----------------------------------------------------------
Commit b062a0bd3ed388f89bc04dfa2903bf3cc951976c in accumulo's branch
refs/heads/master from [~elserj]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b062a0b ]
ACCUMULO-2925 Create regular Mutations from ServerMutations when applying
replication data on a peer
Mutations do not store unserialized ColumnUpdates, but only generate them
on demand via the getter. This is intended to create an efficient implementation
(both performance and size) while preseving immutability.
Server-assigned timestamps work around this immutability by wrapping normal
Mutations in a ServerMutation and ColumnUpdates with ServerColumnUpdates. By
doing
this, ServerMutations can "fake" the timestamp on ColumnUpdates that otherwise
do not have a timestamp set.
In the context of replication, this is still a problem as all Mutations that are
sent to a peer are ServerMutations (as we read them from a WAL). These
Mutations are
deserialized and passed into a BatchWriter to apply to the local instance;
however, the
BatchWriter is ignorant of ServerMutations and the special timestamp handling.
When the BatchWriter makes a "copy" of the Mutation (see ACCUMULO-2915),
despite this
being a shallow copy, the server-assigned timestamp is lost by creating a
regular
Mutation from what was a ServerMutation. Even if this were possible, the
TMutation
class, which the BatchWriter eventually uses to send to the Mutations to a
TabletServer,
is also ignorant of the ServerMutation timestamp without modification of the
serialization
and TMutation class.
As such, the only option left is to, when encountering ServerMutations in the
BatchWriterReplicationReplayer
code, we *must* recreate new Mutations, applying the possibly present
server-timestamp to
each new Mutation we create to ensure that the timestamp is correctly
propagated to this peer.
> Avoid copying all Mutations when using a TabletServerBatchWriter
> ----------------------------------------------------------------
>
> Key: ACCUMULO-2915
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2915
> Project: Accumulo
> Issue Type: Improvement
> Components: client
> Affects Versions: 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0
> Reporter: William Slacum
> Assignee: William Slacum
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
>
> Currently in the TabletServerBatchWriter, the following behavior is exhibited:
> {code}
> // create a copy of mutation so that after this method returns the user
> // is free to reuse the mutation object, like calling readFields... this
> // is important for the case where a mutation is passed from map to reduce
> // to batch writer... the map reduce code will keep passing the same
> mutation
> // object into the reduce method
> m = new Mutation(m);
>
> totalMemUsed += m.estimatedMemoryUsed();
> mutations.addMutation(table, m);
> totalAdded++;
> {code}
> This means all data is copied twice when writing. The logic for doing this is
> a bit dubious, since not all clients are going to be subject to MapReduce's
> use of references.
> It'd be good if we provided users with a way of signaling that there's no
> need to copy the mutation payload. [~elserj] suggested creating something
> akin to an {{ImmutableMutation}}, which help avoid some of the fears the
> batchwriter attempts to defend against.
--
This message was sent by Atlassian JIRA
(v6.2#6252)