I also created the proposal page - https://cwiki.apache.org/confluence/display/DL/DP-3+-+DistributedLog+Stream+Operation+Primitives
Let me know if you have any comments. If there is no objections, I'd like to send out the pull requests soon. - Gerrit On Tue, Nov 22, 2016 at 8:11 PM, Gerrit Sundaram <gerritsunda...@gmail.com> wrote: > Thanks Sijie! > > I also created https://issues.apache.org/jira/browse/DL-72 for tracking > the implementation of the discussion here. I will create the proposal soon. > > - Gerrit > > On Tue, Nov 22, 2016 at 7:58 PM, Sijie Guo <si...@apache.org> wrote: > >> Done. Gerrit, I just granted the permissions to you. Let me know if it >> works. >> >> - Sijie >> >> On Tue, Nov 22, 2016 at 7:51 PM, Gerrit Sundaram < >> gerritsunda...@gmail.com> >> wrote: >> >> > Sijie, >> > >> > Sorry for late response. Here is my wiki account : >> > https://cwiki.apache.org/confluence/display/~gerritsundaram >> > >> > Thanks in advance. >> > >> > - Gerrit >> > >> > On Thu, Nov 17, 2016 at 10:03 AM, Sijie Guo <sij...@twitter.com.invalid >> > >> > wrote: >> > >> > > Gerrit, >> > > >> > > Can you send me your wiki account? >> > > >> > > - Sijie >> > > >> > > On Thu, Nov 17, 2016 at 1:38 AM, Gerrit Sundaram < >> > gerritsunda...@gmail.com >> > > > >> > > wrote: >> > > >> > > > Can you grant me the permissions for editing the wiki page? >> > > > >> > > > - Gerrit >> > > > >> > > > On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram < >> > > gerritsunda...@gmail.com >> > > > > >> > > > wrote: >> > > > >> > > > > >> > > > > >> > > > > On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <si...@apache.org> >> wrote: >> > > > > >> > > > >> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram < >> > > > >> gerritsunda...@gmail.com> >> > > > >> wrote: >> > > > >> >> > > > >> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <si...@apache.org> >> > > wrote: >> > > > >> > >> > > > >> > > I liked this topic. A better name might be 'stream storage >> > > > >> primitives', >> > > > >> > as >> > > > >> > > we treat DL as a stream storage. Comments inline. >> > > > >> > > >> > > > >> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram < >> > > > >> > gerritsunda...@gmail.com> >> > > > >> > > wrote: >> > > > >> > > >> > > > >> > > > As what Sijie suggested in the other email thread, I >> started >> > > this >> > > > >> email >> > > > >> > > > thread for discussing the stream operation primitives. >> > > > >> > > > >> > > > >> > > > The stream operations that I am aware of that DL supports >> are >> > > > >> > > > >> > > > >> > > > * Open a distributedlog stream >> > > > >> > > > * Delete a distributedlog stream >> > > > >> > > > * List all the distributedlog streams under a namespace >> > > > >> > > > >> > > > >> > > >> > > > >> > > Are you also looking for listing streams under a >> > 'sub-namespace' - >> > > > (or >> > > > >> > > streams have common prefix)? (Based on my understanding on >> your >> > > > >> proposal, >> > > > >> > > you might need this for a filesystem-like API?) >> > > > >> > > >> > > > >> > >> > > > >> > Yes. However it seems like DL is more designed with flat >> namespace >> > > > with >> > > > >> > just streams. >> > > > >> >> > > > >> >> > > > >> Ah, yes. The original thought is to tight a namespace to a user >> or >> > an >> > > > >> application. Under a namespace, application can manage the >> streams >> > by >> > > > >> their >> > > > >> own. So that's why it was designed with a flat namespace. >> > > > >> >> > > > >> >> > > > >> > There is no concept about 'sub-namespace'. Although I >> > > > >> > probably can hack it by just naming the stream names in a >> > filesystem >> > > > >> > path-like way. >> > > > >> > >> > > > >> > However I am still curious do you guys want to introduce any >> sort >> > of >> > > > >> naming >> > > > >> > hierarchy in the naming within a namespace. For example, can >> you >> > > have >> > > > a >> > > > >> > 'StreamSet', which is a set of streams? (like in filesystem, a >> > > > directory >> > > > >> > has a list of children). If you have similar hierarchical, it >> > > > definitely >> > > > >> > will simply my work. >> > > > >> > >> > > > >> >> > > > >> In the write proxy, we have a similar concept like 'StreamSet' to >> > > group >> > > > >> some physical DL streams into one virtual stream. However that >> was >> > > > mostly >> > > > >> used for exporting metrics for grouped virtual streams. We don't >> > quite >> > > > >> emphasize the concept of 'virtual stream' in DL. As we tended to >> let >> > > the >> > > > >> application decide what the virtual stream looks like. >> > > > >> >> > > > >> However, for metadata organization and management, it might make >> > sense >> > > > to >> > > > >> think of such hierarchy. >> > > > >> >> > > > >> What do you have in your mind about 'StreamSet'? Can you explain >> a >> > > > little >> > > > >> more? >> > > > > >> > > > > >> > > > > I was thinking a group of streams that might be used for same >> > > application >> > > > > but store different parts of data. It is like the 'virtual' stream >> > that >> > > > you >> > > > > mentioned. >> > > > > >> > > > > - Gerrit >> > > > > >> > > > > >> > > > >> >> > > > >> > >> > > > >> > >> > > > >> > > >> > > > >> > > >> > > > >> > > > * Seal a distributedlog stream >> > > > >> > > > * Truncate a distributedlog stream >> > > > >> > > > >> > > > >> > > >> > > > >> > > Just to clarify this, the 'truncate' in DL is to trim the >> head >> > of >> > > > the >> > > > >> > > stream not the tail. >> > > > >> > > The 'truncate' in filesystem world is to a size of precisely >> > > > *length* >> > > > >> > > bytes, it is truncating the tail. >> > > > >> > > >> > > > >> > > Make sure we clarified it and are on same page. >> > > > >> > > >> > > > >> > >> > > > >> > Yes, we are on the same page. >> > > > >> > >> > > > >> > >> > > > >> > > >> > > > >> > > >> > > > >> > > > >> > > > >> > > > I am looking for a more filesystem-like API. for example, >> > > > >> > > > >> > > > >> > > > * Get the status/attributes of a stream (like stat in >> > > filesystem) >> > > > >> > > > >> > > > >> > > >> > > > >> > > +1 for stream status/attributes. I think we might actually >> > already >> > > > >> have >> > > > >> > > this in DL. since in kestrel, we use that for storing >> customized >> > > > >> > metadata. >> > > > >> > > It might make sense to formalize it into 'stream status'. >> > > > >> > > >> > > > >> > >> > > > >> > Gotcha. >> > > > >> > >> > > > >> > >> > > > >> > > >> > > > >> > > >> > > > >> > > > * Rename a stream >> > > > >> > > > >> > > > >> > > >> > > > >> > > we've talked about this for a while. +1. >> > > > >> > > >> > > > >> > > >> > > > >> > > > * Symlink a stream >> > > > >> > > >> > > > >> > > >> > > > >> > > Symlink a stream is probably easy to do. +1 we've thought >> about >> > > that >> > > > >> for >> > > > >> > > having the flexibility to move stream between different >> storage >> > > > >> backend. >> > > > >> > > Symlink would help this. >> > > > >> > > >> > > > >> > > But a more fundamental thought here is symlinks for log >> > segments. >> > > So >> > > > >> > when a >> > > > >> > > symlinked stream is deleted, the underneath log segments >> might >> > not >> > > > be >> > > > >> > > deleted until its link count decreased to zero. >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > > >> > > > >> > > > Another operations that I can think of might be useful. >> > > > >> > > > >> > > > >> > > > * Split/Fork a stream (it can be useful for dynamic data >> > > > >> partitioning) >> > > > >> > > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > Split and fork a stream sounds interesting. But it sounds >> like a >> > > > more >> > > > >> > > high-level feature rather than storage primitives. Actually, >> it >> > > > might >> > > > >> be >> > > > >> > a >> > > > >> > > good separate discussion feature. >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > > * Merge/Concat streams >> > > > >> > > > >> > > > >> > > >> > > > >> > > >> > > > >> > > I think there is already one outstanding jira for >> concatenating >> > > two >> > > > DL >> > > > >> > > streams. Jia and Arvind are working on that. >> > > > >> > > >> > > > >> > > https://issues.apache.org/jira/browse/DL-46 >> > > > >> > >> > > > >> > >> > > > >> > I will watch that lira. >> > > > >> > >> > > > >> > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > > >> > > > >> > > > The above operations are based on my knowledge about DL. >> Feel >> > > free >> > > > >> to >> > > > >> > add >> > > > >> > > > more. >> > > > >> > > >> > > > >> > > >> > > > >> > > > >> > > > >> > > > - Gerrit >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > > >> > > > > >> > > > >> > > >> > >> > >