Josh Elser created ACCUMULO-2846:
------------------------------------
Summary: Need to re-use DataInputStream for reading files that
need replication
Key: ACCUMULO-2846
URL: https://issues.apache.org/jira/browse/ACCUMULO-2846
Project: Accumulo
Issue Type: Sub-task
Components: replication
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 1.7.0
In doing multi-node tests with continuous ingest, I was watching the ingest
performance on the peer via the monitor.
I noticed that the ingest rate had a regular pattern to it, where ingest would
spike, and then regularly decrease by a (mostly) fixed interval, flat-line, and
then repeat.
I believe each cycle on the ingest graph is the replication of a file from the
primary. The reduction in throughput is relative to the amount of time it takes
to re-read the "prefix" of the file which we already replicated. I need to push
some more logic down into the AccumuloReplicaSystem so that we can avoid that
growing penalty for seeking over the data which we don't need to re-process.
The cost is that it pushes more complexity into the AccumuloReplicaSystem, but,
I imagine that after I write an implementation to replicate to some other
system, it would become more obvious where the common points live that can be
abstracted into a common base class.
--
This message was sent by Atlassian JIRA
(v6.2#6252)