Xi, I just granted you the edit permission.
- Sijie On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu <xi.liu....@gmail.com> wrote: > I still can not edit the wiki. Can any of the pmc members grant me the > permissions? > > - Xi > > On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu <xi.liu....@gmail.com> wrote: > > > Sijie, > > > > I attempted to create a wiki page under that space. I found that I am not > > authorized with edit permission. > > > > Can any of the committers grant me the wiki edit permission? My account > is > > "xi.liu.ant". > > > > - Xi > > > > > > On Tue, Sep 13, 2016 at 9:26 AM, Sijie Guo <si...@apache.org> wrote: > > > >> This sounds interesting ... I will take a closer look and give my > comments > >> later. > >> > >> At the same time, do you mind creating a wiki page to put your idea > there? > >> You can add your wiki page under > >> https://cwiki.apache.org/confluence/display/DL/Project+Proposals > >> > >> You might need to ask in the dev list to grant the wiki edit permissions > >> to > >> you once you have a wiki account. > >> > >> - Sijie > >> > >> > >> On Mon, Sep 12, 2016 at 2:20 AM, Xi Liu <xi.liu....@gmail.com> wrote: > >> > >> > Hello, > >> > > >> > I asked the transaction support in distributedlog user group two > months > >> > ago. I want to raise this up again, as we are looking for using > >> > distributedlog for building a transactional data service. It is a > major > >> > feature that is missing in distributedlog. We have some ideas to add > >> this > >> > to distributedlog and want to know if they make sense or not. If they > >> are > >> > good, we'd like to contribute and develop with the community. > >> > > >> > Here are the thoughts: > >> > > >> > ------------------------------------------------- > >> > > >> > From our understanding, DL can provide "at-least-once" delivery > semantic > >> > (if not, please correct me) but not "exactly-once" delivery semantic. > >> That > >> > means that a message can be delivered one or more times if the reader > >> > doesn't handle duplicates. > >> > > >> > The duplicates come from two places, one is at writer side (this > assumes > >> > using write proxy not the core library), while the other one is at > >> reader > >> > side. > >> > > >> > - writer side: if the client attempts to write a record to the write > >> > proxies and gets a network error (e.g timeouts) then retries, the > >> retrying > >> > will potentially result in duplicates. > >> > - reader side:if the reader reads a message from a stream and then > >> crashes, > >> > when the reader restarts it would restart from last known position > >> (DLSN). > >> > If the reader fails after processing a record and before recording the > >> > position, the processed record will be delivered again. > >> > > >> > The reader problem can be properly addressed by making use of the > >> sequence > >> > numbers of records and doing proper checkpointing. For example, in > >> > database, it can checkpoint the indexed data with the sequence number > of > >> > records; in flink, it can checkpoint the state with the sequence > >> numbers. > >> > > >> > The writer problem can be addressed by implementing an idempotent > >> writer. > >> > However, an alternative and more powerful approach is to support > >> > transactions. > >> > > >> > *What does transaction mean?* > >> > > >> > A transaction means a collection of records can be written > >> transactionally > >> > within a stream or across multiple streams. They will be consumed by > the > >> > reader together when a transaction is committed, or will never be > >> consumed > >> > by the reader when the transaction is aborted. > >> > > >> > The transaction will expose following guarantees: > >> > > >> > - The reader should not be exposed to records written from uncommitted > >> > transactions (mandatory) > >> > - The reader should consume the records in the transaction commit > order > >> > rather than the record written order (mandatory) > >> > - No duplicated records within a transaction (mandatory) > >> > - Allow interleaving transactional writes and non-transactional writes > >> > (optional) > >> > > >> > *Stream Transaction & Namespace Transaction* > >> > > >> > There will be two types of transaction, one is Stream level > transaction > >> > (local transaction), while the other one is Namespace level > transaction > >> > (global transaction). > >> > > >> > The stream level transaction is a transactional operation on writing > >> > records to one stream; the namespace level transaction is a > >> transactional > >> > operation on writing records to multiple streams. > >> > > >> > *Implementation Thoughts* > >> > > >> > - A transaction is consist of begin control record, a series of data > >> > records and commit/abort control record. > >> > - The begin/commit/abort control record is written to a `commit` log > >> > stream, while the data records will be written to normal data log > >> streams. > >> > - The `commit` log stream will be the same log stream for stream-level > >> > transaction, while it will be a *system* stream (or multiple system > >> > streams) for namespace-level transactions. > >> > - The transaction code looks like as below: > >> > > >> > <code> > >> > > >> > Transaction txn = client.transaction(); > >> > Future<DLSN> result1 = txn.write(stream-0, record); > >> > Future<DLSN> result2 = txn.write(stream-1, record); > >> > Future<DLSN> result3 = txn.write(stream-2, record); > >> > Future<Pair<DLSN, DLSN>> result = txn.commit(); > >> > > >> > </code> > >> > > >> > if the txn is committed, all the write futures will be satisfied with > >> their > >> > written DLSNs. if the txn is aborted, all the write futures will be > >> failed > >> > together. there is no partial failure state. > >> > > >> > - The actually data flow will be: > >> > > >> > 1. writer get a transaction id from the owner of the `commit' log > stream > >> > 1. write the begin control record (synchronously) with the transaction > >> id > >> > 2. for each write within the same txn, it will be assigned a local > >> sequence > >> > number starting from 0. the combination of transaction id and local > >> > sequence number will be used later on by the readers to de-duplicate > >> > records. > >> > 3. the commit/abort control record will be written based on the > results > >> > from 2. > >> > > >> > - Application can supply a timeout for the transaction when #begin() a > >> > transaction. The owner of the `commit` log stream can abort > transactions > >> > that never be committed/aborted within their timeout. > >> > > >> > - Failures: > >> > > >> > * all the log records can be simply retried as they will be > >> de-duplicated > >> > probably at the reader side. > >> > > >> > - Reader: > >> > > >> > * Reader can be configured to read uncommitted records or committed > >> records > >> > only (by default read uncommitted records) > >> > * If reader is configured to read committed records only, the read > ahead > >> > cache will be changed to maintain one additional pending committed > >> records. > >> > the pending committed records map is bounded and records will be > dropped > >> > when read ahead is moving. > >> > * when the reader hits a commit record, it will rewind to the begin > >> record > >> > and start reading from there. leveraging the proper read ahead cache > and > >> > pending commit records cache, it would be good for both short > >> transactions > >> > and long transactions. > >> > > >> > - DLSN, SequenceId: > >> > > >> > * We will add a fourth field to DLSN. It is `local sequence number` > >> within > >> > a transaction session. So the new DLSN of records in a transaction > will > >> be > >> > the DLSN of commit control record plus its local sequence number. > >> > * The sequence id will be still the position of the commit record plus > >> its > >> > local sequence number. The position will be advanced with total number > >> of > >> > written records on writing the commit control record. > >> > > >> > - Transaction Group & Namespace Transaction > >> > > >> > using one single log stream for namespace transaction can cause the > >> > bottleneck problem since all the begin/commit/end control records will > >> have > >> > to go through one log stream. > >> > > >> > the idea of 'transaction group' is to allow partitioning the writers > >> into > >> > different transaction groups. > >> > > >> > clients can specify the `group-name` when starting the transaction. if > >> > there is no `group-name` specified, it will use the default `commit` > >> log in > >> > the namespace for creating transactions. > >> > > >> > ------------------------------------------------- > >> > > >> > I'd like to collect feedbacks on this idea. Appreciate any comments > and > >> if > >> > anyone is also interested in this idea, we'd like to collaborate with > >> the > >> > community. > >> > > >> > > >> > - Xi > >> > > >> > > > > >