What about topic-level metrics? Are we going to report metrics at all level now? Or maybe just at partition-level and use the monitoring app to aggregate them in different levels (i.e. remove topic metrics completely)?
On Wed, Oct 21, 2015 at 3:47 PM, Ashish Singh <[email protected]> wrote: > On Wed, Oct 21, 2015 at 2:22 PM, Jay Kreps <[email protected]> wrote: > >> Gwen, It's a good question of what the producer semantics are--would we >> only allow you to produce to a partition or first level directory or would >> we hash over whatever subtree you supply? Actually not sure which makes >> more sense... >> >> Ashish, here are some thoughts: >> 1. I think we can do this online. There is a question of what happens to >> readers and writers but presumably it would the same thing as if that topic >> weren't there. There would be no guarantee this would happen atomic over >> different brokers or clients, though. >> 2. ACLs should work like unix perms, right? > > > Are you suggesting we should move allowed operations to R, W, X model of > unix. Currently, we support these operations > <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/security/auth/Operation.scala#L25> > . > > I think configs would overide >> hierarchically, so we would have a full set of configs for each partition >> computed by walking up the tree from the root and taking the first >> override). I think this is what you're describing, right? >> > > Yes. > > 3. Totally agree no reason to have an arbitrary limit. >> 4. I actually don't think the physical layout on disk should be at all >> connected to the logical directory hierarchy we present. > > > I think it will be useful to have that connection as that will enable users > to encrypt different namespaces with different keys. Thus, one more step > towards a completely multi tenant system. > > >> That is, whether >> you use RAID or not shouldn't impact the location of a topic in your >> directory structure. > > > Even if we make physical layout on disk representative of directory > hierarchy, I think this will not be a concern. Correct me, if I am missing > something. > > Not sure if this is what you are saying or not. This >> does raise the question of how to do the disk layout. The simplest thing >> would be to keep the flat data directories but make the names of the >> partitions on disk just be logical inode numbers and then have a separate >> mapping of these inodes to logical names stored in ZK with a cache. I think >> this would make things like rename fast and atomic. The downside of this is >> that the 'ls' command will no longer tell you much about the data on a >> broker. >> > > Enabling renaming of topics is definitely something that will be nice to > have, however with the flat structure we won't be able to enable encrypting > different directories/ namespaces with different keys. However, with > directory hierarchy on disk can be achieved with logical names, each dir > will need a logical name though. > > >> -Jay >> >> On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh <[email protected]> >> wrote: >> >> > In last KIP hangout following questions were raised. >> > >> > 1. >> > >> > *Whether or not to support move command? If yes, how do we support >> it.* >> > I think *move* command will be essential, once we start supporting >> > directories. However, implementation might be a bit convoluted. A few >> > things required for it will be, ability to mark a topic unavailable >> > during >> > the move, update brokers’ metadata cache to reflect the move. >> > 2. >> > >> > *How will acls/ configs inheritance work?* >> > Say we have /dc/ns/topic. >> > dc has dc_acl and dc_config. Similarly for ns and topic. >> > For being able to perform an action on /dc/ns/topic, the user must >> have >> > required perms on dc, ns and topic for that operation. For example, >> > User1 >> > will need DESCRIBE permissions on dc, ns and topic to be able to >> > describe >> > /dc/ns/topic. >> > For configs, configs for /dc/ns/topic will be topic_config + >> ns_config + >> > dc_config, in that order. So, if a config is specified for topic then >> > that >> > will be used, else it’s parent (ns) will be checked for that config, >> and >> > this goes on. >> > 3. >> > >> > *Will supporting n-deep hierarchy be a concern?* >> > This can be a performance concern, however it sounds more of a >> misusage >> > of the functionality or bad organization of topics. We can have a >> depth >> > limit, but I am not sure if it is required. >> > 4. >> > >> > *Will we continue to support multi-directory on disk, that was >> proposed >> > in KAFKA-188?* >> > Yes, we should be able to support that. It is within those >> directories, >> > namespaces will be created. The heuristics for choosing least loaded >> > disc/dir will remain same. >> > 5. >> > >> > *Will it be required to move existing topics from default directory/ >> > namespace to a particular directory/ namespace to enable mirror-maker >> > replicate topics in that directory/namespace?* >> > I do not think it will be required, as one can simple add /*/* to >> > mirror-maker’s blacklist and this will only capture topics that exist >> in >> > default namespace. @Joel, does this answer your question? >> > >> > >> > >> > On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh <[email protected]> >> wrote: >> > >> > > On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin >> <[email protected] >> > > >> > > wrote: >> > > >> > >> Hey Jay, >> > >> >> > >> If we allow consumer to subscribe to /*/my-event, does that mean we >> > allow >> > >> consumer to consume cross namespaces? >> > > >> > > That is the idea. If a user has permissions then yes, he should be able >> > to >> > > consume from as many namespaces as he wants. >> > > >> > > >> > >> In that case it seems not >> > >> "hierarchical" but more like a name field filtering. i.e. user can >> > choose >> > >> to consume from topic where datacenter={x,y}, >> > >> topic_name={my-topic1,mytopic2}. Am I understanding right? >> > >> >> > > I think it is still hierarchical, however with possible filtering (as >> you >> > > said). >> > > >> > >> >> > >> Thanks, >> > >> >> > >> Jiangjie (Becket) Qin >> > >> >> > >> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <[email protected]> wrote: >> > >> >> > >> > Hey Jason, >> > >> > >> > >> > I actually think this is one of the advantages. The problem we have >> > >> today >> > >> > is that you can't really do bidirectional replication between >> clusters >> > >> > because it would actually be a feedback loop. >> > >> > >> > >> > So the intended use would be that you would have a structure where >> the >> > >> > top-level directory was DIFFERENT but the topic names were the same, >> > so >> > >> if >> > >> > you maintain >> > >> > /chicago-datacenter/actual-topics >> > >> > /oregon-datacenter/actual topics >> > >> > etc. >> > >> > Then you replicate >> > >> > /chicago-datacenter/* => /oregon-datacenter >> > >> > and >> > >> > /oregon-datacenter/* => /chicago-datacenter >> > >> > >> > >> > People who want the aggregate feed subscribe to /*/my-event. >> > >> > >> > >> > The nice thing about this is it gives a unified namespace across all >> > >> > locations. >> > >> > >> > >> > Basically exactly what we do now but you no longer need to add new >> > >> clusters >> > >> > to get the namespacing. >> > >> > >> > >> > -Jay >> > >> > >> > >> > >> > >> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson < >> [email protected] >> > > >> > >> > wrote: >> > >> > >> > >> > > Hey Ashish, thanks for the write-up. I think having a namespace >> > >> > capability >> > >> > > is a useful feature for Kafka, in particular with the addition of >> > the >> > >> > > authorization layer. I probably prefer Jay's hierarchical approach >> > if >> > >> > we're >> > >> > > going to embed the namespace in the topic name since it seems more >> > >> > general. >> > >> > > That said, one advantage of having a namespace independent of the >> > >> topic >> > >> > > name is that it simplifies replication between namespaces a bit >> > since >> > >> you >> > >> > > don't have to parse and rewrite topic names. Assuming that >> > >> hierarchical >> > >> > > topics will happen eventually anyway, I imagine a common pattern >> > >> would be >> > >> > > to preserve the same directory structure in multiple namespaces, >> so >> > >> > having >> > >> > > an easy mechanism for applications to switch between them would be >> > >> nice. >> > >> > > The namespace is kind of analogous to a chroot in this case. Of >> > course >> > >> > you >> > >> > > can achieve the same thing by having a configurable topic prefix, >> > just >> > >> > you >> > >> > > have to do all the topic rewriting, which I'm guessing will be a >> > >> little >> > >> > > annoying to implement in all of the clients and tools. However, >> the >> > >> > > tradeoff (as you mention in the KIP) is that all request schemas >> > have >> > >> to >> > >> > be >> > >> > > updated, which is also annoying. >> > >> > > >> > >> > > -Jason >> > >> > > >> > >> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh < >> [email protected] >> > > >> > >> > > wrote: >> > >> > > >> > >> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira < >> [email protected]> >> > >> > wrote: >> > >> > > > >> > >> > > > > This works really nicely from the consumer side, but what >> about >> > >> the >> > >> > > > > producer? If there are no more topics,do we allow producing >> to a >> > >> > > > directory >> > >> > > > > and have the Partitioner hash-partition messages between all >> > >> > partitions >> > >> > > > in >> > >> > > > > the multiple levels in a directory? >> > >> > > > > >> > >> > > > Good point. >> > >> > > > >> > >> > > > I am personally in favor of maintaining current behavior for >> > >> producer, >> > >> > > > i.e., letting users to only produce to a topic. This is >> different >> > >> for >> > >> > > > consumers, the suggested behavior is inline with current >> behavior. >> > >> One >> > >> > > can >> > >> > > > use regex subscription to achieve the same even today. >> > >> > > > >> > >> > > > > >> > >> > > > > Also, I think we want to preserve the consumer terminology of >> > >> > > "subscribe" >> > >> > > > > to topics / directories, but "assign" partitions - since the >> > >> consumer >> > >> > > > > behavior is different in those cases. >> > >> > > > > >> > >> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <[email protected]> >> > >> wrote: >> > >> > > > > >> > >> > > > > > Okay this is similar to what I think we have talked about >> > >> before. >> > >> > Let >> > >> > > > me >> > >> > > > > > elaborate on the idea that I think has been floating >> > >> around--it's >> > >> > > > pretty >> > >> > > > > > similar with a few differences. >> > >> > > > > > >> > >> > > > > > I think what you are calling the "default namespace" is >> > >> basically >> > >> > > what >> > >> > > > I >> > >> > > > > > would call the "current working directory" with paths not >> > >> beginning >> > >> > > > with >> > >> > > > > > '/' being interpreted relative to this directory as in the >> fs. >> > >> > > > > > >> > >> > > > > > One thing you have to work out is what levels in this >> > hierarchy >> > >> you >> > >> > > can >> > >> > > > > > actually subscribe to. I think you are assuming only what we >> > >> > > currently >> > >> > > > > > consider a "topic", i.e. the first level of directories but >> > not >> > >> the >> > >> > > > > > partitions or parent dirs, would be subscribable. If you >> think >> > >> > about >> > >> > > > it, >> > >> > > > > > though, that constraint is a bit arbitrary. >> > >> > > > > > >> > >> > > > > > I'd propose instead the semantics that: >> > >> > > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th >> > >> partition of >> > >> > > > topic >> > >> > > > > > "c" in directory /a/b >> > >> > > > > > - Subscribing to /a/b/c means subscribing to all partitions >> in >> > >> > > > > > topic/directory "c" >> > >> > > > > > - Subscribing to /a/b means subscribing to all partitions in >> > all >> > >> > > > > > topics/subdirectories under a/b recursively >> > >> > > > > > >> > >> > > > > > Effectively the concept of topics goes away entirely--you >> just >> > >> have >> > >> > > > > > partitions/logs and directories. In this respect rather than >> > >> adding >> > >> > > new >> > >> > > > > > concepts this new feature would actually just generalizes >> what >> > >> we >> > >> > > have >> > >> > > > > > (which I think is a good thing). >> > >> > > > > > >> > >> > > > > > -Jay >> > >> > > > > > >> > >> > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh < >> > >> [email protected] >> > >> > > >> > >> > > > > wrote: >> > >> > > > > > >> > >> > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps < >> > [email protected]> >> > >> > > wrote: >> > >> > > > > > > >> > >> > > > > > > > Great. I definitely would strongly favor carrying over >> > >> user's >> > >> > > > > intuition >> > >> > > > > > > > from FS unless we think we need a very different model. >> > The >> > >> > minor >> > >> > > > > > details >> > >> > > > > > > > like the seperator and namespace term will help with >> that. >> > >> > > > > > > > >> > >> > > > > > > > Follow-up question, say I have a layout like >> > >> > > > > > > > /chicago-datacenter/user-events/pageviews >> > >> > > > > > > > Can I subscribe to >> > >> > > > > > > > /chicago-datacenter/user-events >> > >> > > > > > > > >> > >> > > > > > > Yes, however they will have need a regex like >> > >> > > > > > > /chicago-datacenter/user-events/* >> > >> > > > > > > >> > >> > > > > > > > to get the full firehose of user events from chicago? >> Can >> > I >> > >> > > > subscribe >> > >> > > > > > to >> > >> > > > > > > > /*/user-events >> > >> > > > > > > > to get user events originating from all datacenters? >> > >> > > > > > > > >> > >> > > > > > > Yes, however they will have need a regex like >> > >> > > > > > > /chicago-datacenter/user-events/* >> > >> > > > > > > Yes >> > >> > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > (Assuming, for now, that these are all in the same >> > >> cluster...) >> > >> > > > > > > > >> > >> > > > > > > > Also, just to confirm, it sounds from the proposal like >> > >> config >> > >> > > > > > overrides >> > >> > > > > > > > would become fully hierarchical so you can override >> config >> > >> at >> > >> > any >> > >> > > > > > > directory >> > >> > > > > > > > point. This will add complexity in implementation but I >> > >> think >> > >> > > will >> > >> > > > > > likely >> > >> > > > > > > > be much more operator friendly. >> > >> > > > > > > > >> > >> > > > > > > Yes, that is the idea. >> > >> > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > There are about a thousand details to discuss in terms >> of >> > >> how >> > >> > > this >> > >> > > > > > would >> > >> > > > > > > > impact the metadata request, various zk entries, and >> > various >> > >> > > other >> > >> > > > > > > aspects, >> > >> > > > > > > > but probably it makes sense to first agree on how we >> would >> > >> want >> > >> > > it >> > >> > > > to >> > >> > > > > > > work >> > >> > > > > > > > and then start to dive into how to implement that. >> > >> > > > > > > > >> > >> > > > > > > Agreed. >> > >> > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > -Jay >> > >> > > > > > > > >> > >> > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh < >> > >> > > [email protected] >> > >> > > > > >> > >> > > > > > > wrote: >> > >> > > > > > > > >> > >> > > > > > > > > Hey Jay, thanks for reviewing the proposal. Answers >> > >> inline. >> > >> > > > > > > > > >> > >> > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps < >> > >> > [email protected]> >> > >> > > > > > wrote: >> > >> > > > > > > > > >> > >> > > > > > > > > > Hey guys, >> > >> > > > > > > > > > >> > >> > > > > > > > > > I think this is an important feature and one we've >> > >> talked >> > >> > > about >> > >> > > > > > for a >> > >> > > > > > > > > > while. I really think trying to invent a new >> > >> nomenclature >> > >> > is >> > >> > > > > going >> > >> > > > > > to >> > >> > > > > > > > > make >> > >> > > > > > > > > > it hard for people to understand, though. As such I >> > >> > recommend >> > >> > > > we >> > >> > > > > > call >> > >> > > > > > > > > > namespaces "directories" and denote them with >> > '/'--this >> > >> > will >> > >> > > > make >> > >> > > > > > the >> > >> > > > > > > > > > feature 1000x more understandable to people. >> > >> > > > > > > > > >> > >> > > > > > > > > Essentially you are suggesting two things here. >> > >> > > > > > > > > 1. Use "Directory" instead of "Namespace" as it is >> more >> > >> > > > intuitive. >> > >> > > > > I >> > >> > > > > > > > agree. >> > >> > > > > > > > > 2. Make '/' as delimiter instead of ':'. Fine with me >> > and >> > >> I >> > >> > > agree >> > >> > > > > if >> > >> > > > > > we >> > >> > > > > > > > > call these directories, '/' is the way to go. >> > >> > > > > > > > > >> > >> > > > > > > > > I think we should inheret the >> > >> > > > > > > > > > semantics of normal unix fs in so far as it makes >> > sense. >> > >> > > > > > > > > > >> > >> > > > > > > > > > In this approach we get rid of topics entirely, >> > instead >> > >> we >> > >> > > > really >> > >> > > > > > > just >> > >> > > > > > > > > have >> > >> > > > > > > > > > partitions which are the equivalent of a file and >> > retain >> > >> > > their >> > >> > > > > > > numeric >> > >> > > > > > > > > > names, and the existing topic concept is just the >> > first >> > >> > > > directory >> > >> > > > > > > level >> > >> > > > > > > > > but >> > >> > > > > > > > > > we generalize to allow arbitrarily many more levels >> of >> > >> > > nesting. >> > >> > > > > > This >> > >> > > > > > > > > allows >> > >> > > > > > > > > > categorization of data, such as >> > >> > > > > > /datacenter1/user-events/page-views/3 >> > >> > > > > > > > and >> > >> > > > > > > > > > you can subscribe, apply configs or permissions at >> any >> > >> > level >> > >> > > of >> > >> > > > > the >> > >> > > > > > > > > > hierarchy. >> > >> > > > > > > > > > >> > >> > > > > > > > > +1. This actually requires just a minor change to >> > existing >> > >> > > > > proposal, >> > >> > > > > > > > i.e., >> > >> > > > > > > > > "some:namespace:topic" becomes "some/namespace/topic". >> > >> > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > > I'm actually not 100% such what the semantics of >> > >> accessing >> > >> > > data >> > >> > > > > in >> > >> > > > > > > > > > differing namespaces is in the current proposal, >> maybe >> > >> you >> > >> > > can >> > >> > > > > > > clarify >> > >> > > > > > > > > > Ashish? >> > >> > > > > > > > > >> > >> > > > > > > > > I will add more info to KIP on this, however I think a >> > >> client >> > >> > > > > should >> > >> > > > > > be >> > >> > > > > > > > > able to access data in any namespace as long as >> > following >> > >> > > > > conditions >> > >> > > > > > > are >> > >> > > > > > > > > satisfied. >> > >> > > > > > > > > >> > >> > > > > > > > > 1. Namespace, the client is trying to access, exists. >> > >> > > > > > > > > 2. The client has sufficient permissions on the >> > namespace >> > >> for >> > >> > > > type >> > >> > > > > of >> > >> > > > > > > > > operation the client is trying to perform on a topic >> > >> within >> > >> > > that >> > >> > > > > > > > namespace. >> > >> > > > > > > > > 3. The client has sufficient permissions on the topic >> > for >> > >> > type >> > >> > > of >> > >> > > > > > > > operation >> > >> > > > > > > > > the client is trying to perform on that topic. >> > >> > > > > > > > > >> > >> > > > > > > > > If we choose to go with what you suggested earlier >> that >> > >> just >> > >> > > have >> > >> > > > > > > > hierarchy >> > >> > > > > > > > > of directories, then step 3 will actually be covered >> in >> > >> step >> > >> > 2. >> > >> > > > > > > > > >> > >> > > > > > > > > In the current proposal, consumers will subscribe to a >> > >> topic >> > >> > > in a >> > >> > > > > > > > namespace >> > >> > > > > > > > > by specifying <namespace>:<topic> as the topic name. >> > They >> > >> can >> > >> > > > > > subscribe >> > >> > > > > > > > to >> > >> > > > > > > > > topics from multiple namespaces. >> > >> > > > > > > > > >> > >> > > > > > > > > Let me know if I totally missed your question. >> > >> > > > > > > > > >> > >> > > > > > > > > Since the point of Kafka is sharing data I think it is >> > >> really >> > >> > > > > > > > > > important that the grouping be just for >> > >> > > > > > > > > convenience/permissions/config/etc >> > >> > > > > > > > > > and that it remain possible to access multiple >> > >> > > > > > directories/namespaces >> > >> > > > > > > > > from >> > >> > > > > > > > > > the same client. >> > >> > > > > > > > > > >> > >> > > > > > > > > Totally agree with you. >> > >> > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > > -Jay >> > >> > > > > > > > > > >> > >> > > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh < >> > >> > > > > [email protected]> >> > >> > > > > > > > > wrote: >> > >> > > > > > > > > > >> > >> > > > > > > > > > > Hey Guys, >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > I just created KIP-37 for adding namespaces to >> > Kafka. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > KIP-37 >> > >> > > > > > > > > > > < >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > tracks the proposal. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > The idea is to make Kafka support multi-tenancy >> via >> > >> > > > namespaces. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > Feedback and comments are welcome. >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > -- >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > Regards, >> > >> > > > > > > > > > > Ashish >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > -- >> > >> > > > > > > > > >> > >> > > > > > > > > Regards, >> > >> > > > > > > > > Ashish >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > -- >> > >> > > > > > > >> > >> > > > > > > Regards, >> > >> > > > > > > Ashish >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > -- >> > >> > > > >> > >> > > > Regards, >> > >> > > > Ashish >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > > >> > > -- >> > > >> > > Regards, >> > > Ashish >> > > >> > >> > >> > >> > -- >> > >> > Regards, >> > Ashish >> > >> > > > > -- > > Regards, > Ashish
