> On March 28, 2014, 6:53 p.m., kturner wrote: > > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 119 > > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line119> > > > > Seems like accumulo should have a public API for querying what needs to > > be replicated, notifying it when something has been replicated, and methods > > for importing replicated data. I am thinking of something different than a > > plugin, more like the import/export table API. How the replication happens > > is up the user. We could provide a default implementation that does > > replication as you mentioned. Some users may want to occassionally > > replicate large batches using map reduce. Others may want to continually > > replicate files using distributed queueing solutions. > > Josh Elser wrote: > My initial thoughts were to provide something at a public api layer due > to the likely desire to integrate WALs as a part of said API. Opening up an > API might prove difficult to implement well -- we would have to design > something that scales out to adequately support the ingest rates Accumulo > will support. > > Not saying I'm against it, but it would be difficult to get right. > Hooking into it would also likely be difficult to implement. > > kturner wrote: > I agree would not want to expose internals of walogs to users. However, > I think this API would just expose URI that need to be replicated. The user > woud not have to care about what the actuall data is pointed to be the URI. > > I am going about this all wrong. I should outline what I would like to > see Accumulo do instead of some incomplete "how" to do it. Stepping back i > would like to see this feature designed to empower admins. > > ZFS is a file system I really like that empowers admins. One way it > empowers admins is by providing a really flexible easy to use mechanism for > replicating file systems. W/ ZFS an admin can do something like the following > to initially replicate a file system. > > # zfs snapshot tank/home@snap1 > # zfs send tank/home@snap1 | ssh host2 zfs recv newtank/home > > After some period of time they can easy replicate the changes to the file > system w/ the following commands. > > # zfs snapshot tank/home@snap2 > # zfs send -i tank/home@snap1 tank/home@snap2 | ssh host2 zfs recv > newtank/dana > > What I like about this is that zfs send writes to std out, so that admin > could write to a file, send over the network, write to tape, etc. Whenever > and however the admin wants to move the data, the ZFS API makes it super easy > for them to do it. Of course we can not do exatcly what ZFS does, but we > can make it easy for admins to move data between clusters in different ways > and on different schedules. > > Josh Elser wrote: > So, wrapping something around (ranges of) WALs and RFile is definitely > desirable here. I believe with that, we can better separate the logic into > discrete pieces: 1) Generate data 2) Transmit data 3) Apply data > > The more we can make the implementations more agnostic of the underlying > data, likely the better. The wrapper around WALs and RFiles would need to > support some semantics like ordering (WAL1 needs to be applied before WAL2), > verification/validation on the remote side (checksum?), and the ability to > efficiently replay this data. > > Thinking further, you could even generalize the problem of how to get > from #1 to #2 as a FIFO queue backed by a table.
Need to call out at least one more step, 4) Report data applied. So the source can GC. - kturner ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19790/#review38927 ----------------------------------------------------------- On March 28, 2014, 5:54 p.m., kturner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19790/ > ----------------------------------------------------------- > > (Updated March 28, 2014, 5:54 p.m.) > > > Review request for accumulo. > > > Bugs: ACCUMULO-378 > https://issues.apache.org/jira/browse/ACCUMULO-378 > > > Repository: accumulo > > > Description > ------- > > ACCUMULO-378 Design document. Posting for review here, not meant for commit. > Final version of document should be posted on issue. > > > Diffs > ----- > > docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION > > Diff: https://reviews.apache.org/r/19790/diff/ > > > Testing > ------- > > > Thanks, > > kturner > >
