Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

Ashish Singh Wed, 21 Oct 2015 15:47:50 -0700

On Wed, Oct 21, 2015 at 2:22 PM, Jay Kreps <[email protected]> wrote:


> Gwen, It's a good question of what the producer semantics are--would we
> only allow you to produce to a partition or first level directory or would
> we hash over whatever subtree you supply? Actually not sure which makes
> more sense...
>
> Ashish, here are some thoughts:
> 1. I think we can do this online. There is a question of what happens to
> readers and writers but presumably it would the same thing as if that topic
> weren't there. There would be no guarantee this would happen atomic over
> different brokers or clients, though.
> 2. ACLs should work like unix perms, right?


Are you suggesting we should move allowed operations to R, W, X model of
unix. Currently, we support these operations
<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/security/auth/Operation.scala#L25>
.

I think configs would overide
> hierarchically, so we would have a full set of configs for each partition
> computed by walking up the tree from the root and taking the first
> override). I think this is what you're describing, right?
>

Yes.

3. Totally agree no reason to have an arbitrary limit.
> 4. I actually don't think the physical layout on disk should be at all
> connected to the logical directory hierarchy we present.


I think it will be useful to have that connection as that will enable users
to encrypt different namespaces with different keys. Thus, one more step
towards a completely multi tenant system.


> That is, whether
> you use RAID or not shouldn't impact the location of a topic in your
> directory structure.


Even if we make physical layout on disk representative of directory
hierarchy,  I think this will not be a concern. Correct me, if I am missing
something.

Not sure if this is what you are saying or not. This
> does raise the question of how to do the disk layout. The simplest thing
> would be to keep the flat data directories but make the names of the
> partitions on disk just be logical inode numbers and then have a separate
> mapping of these inodes to logical names stored in ZK with a cache. I think
> this would make things like rename fast and atomic. The downside of this is
> that the 'ls' command will no longer tell you much about the data on a
> broker.
>

Enabling renaming of topics is definitely something that will be nice to
have, however with the flat structure we won't be able to enable encrypting
different directories/ namespaces with different keys. However, with
directory hierarchy on disk can be achieved with logical names, each dir
will need a logical name though.


> -Jay
>
> On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh <[email protected]>
> wrote:
>
> > In last KIP hangout following questions were raised.
> >
> >    1.
> >
> >    *Whether or not to support move command? If yes, how do we support
> it.*
> >    I think *move* command will be essential, once we start supporting
> >    directories. However, implementation might be a bit convoluted. A few
> >    things required for it will be, ability to mark a topic unavailable
> > during
> >    the move, update brokers’ metadata cache to reflect the move.
> >    2.
> >
> >    *How will acls/ configs inheritance work?*
> >    Say we have /dc/ns/topic.
> >    dc has dc_acl and dc_config. Similarly for ns and topic.
> >    For being able to perform an action on /dc/ns/topic, the user must
> have
> >    required perms on dc, ns and topic for that operation. For example,
> > User1
> >    will need DESCRIBE permissions on dc, ns and topic to be able to
> > describe
> >    /dc/ns/topic.
> >    For configs, configs for /dc/ns/topic will be topic_config +
> ns_config +
> >    dc_config, in that order. So, if a config is specified for topic then
> > that
> >    will be used, else it’s parent (ns) will be checked for that config,
> and
> >    this goes on.
> >    3.
> >
> >    *Will supporting n-deep hierarchy be a concern?*
> >    This can be a performance concern, however it sounds more of a
> misusage
> >    of the functionality or bad organization of topics. We can have a
> depth
> >    limit, but I am not sure if it is required.
> >    4.
> >
> >    *Will we continue to support multi-directory on disk, that was
> proposed
> >    in KAFKA-188?*
> >    Yes, we should be able to support that. It is within those
> directories,
> >    namespaces will be created. The heuristics for choosing least loaded
> >    disc/dir will remain same.
> >    5.
> >
> >    *Will it be required to move existing topics from default directory/
> >    namespace to a particular directory/ namespace to enable mirror-maker
> >    replicate topics in that directory/namespace?*
> >    I do not think it will be required, as one can simple add /*/* to
> >    mirror-maker’s blacklist and this will only capture topics that exist
> in
> >    default namespace. @Joel, does this answer your question?
> >
> > 
> >
> > On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh <[email protected]>
> wrote:
> >
> > > On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin
> <[email protected]
> > >
> > > wrote:
> > >
> > >> Hey Jay,
> > >>
> > >> If we allow consumer to subscribe to /*/my-event, does that mean we
> > allow
> > >> consumer to consume cross namespaces?
> > >
> > > That is the idea. If a user has permissions then yes, he should be able
> > to
> > > consume from as many namespaces as he wants.
> > >
> > >
> > >> In that case it seems not
> > >> "hierarchical" but more like a name field filtering. i.e. user can
> > choose
> > >> to consume from topic where datacenter={x,y},
> > >> topic_name={my-topic1,mytopic2}. Am I understanding right?
> > >>
> > > I think it is still hierarchical, however with possible filtering (as
> you
> > > said).
> > >
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <[email protected]> wrote:
> > >>
> > >> > Hey Jason,
> > >> >
> > >> > I actually think this is one of the advantages. The problem we have
> > >> today
> > >> > is that you can't really do bidirectional replication between
> clusters
> > >> > because it would actually be a feedback loop.
> > >> >
> > >> > So the intended use would be that you would have a structure where
> the
> > >> > top-level directory was DIFFERENT but the topic names were the same,
> > so
> > >> if
> > >> > you maintain
> > >> >   /chicago-datacenter/actual-topics
> > >> >   /oregon-datacenter/actual topics
> > >> >   etc.
> > >> > Then you replicate
> > >> >   /chicago-datacenter/* => /oregon-datacenter
> > >> > and
> > >> >   /oregon-datacenter/* => /chicago-datacenter
> > >> >
> > >> > People who want the aggregate feed subscribe to /*/my-event.
> > >> >
> > >> > The nice thing about this is it gives a unified namespace across all
> > >> > locations.
> > >> >
> > >> > Basically exactly what we do now but you no longer need to add new
> > >> clusters
> > >> > to get the namespacing.
> > >> >
> > >> > -Jay
> > >> >
> > >> >
> > >> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <
> [email protected]
> > >
> > >> > wrote:
> > >> >
> > >> > > Hey Ashish, thanks for the write-up. I think having a namespace
> > >> > capability
> > >> > > is a useful feature for Kafka, in particular with the addition of
> > the
> > >> > > authorization layer. I probably prefer Jay's hierarchical approach
> > if
> > >> > we're
> > >> > > going to embed the namespace in the topic name since it seems more
> > >> > general.
> > >> > > That said, one advantage of having a namespace independent of the
> > >> topic
> > >> > > name is that it simplifies replication between namespaces a bit
> > since
> > >> you
> > >> > > don't have to parse and rewrite topic names. Assuming that
> > >> hierarchical
> > >> > > topics will happen eventually anyway, I imagine a common pattern
> > >> would be
> > >> > > to preserve the same directory structure in multiple namespaces,
> so
> > >> > having
> > >> > > an easy mechanism for applications to switch between them would be
> > >> nice.
> > >> > > The namespace is kind of analogous to a chroot in this case. Of
> > course
> > >> > you
> > >> > > can achieve the same thing by having a configurable topic prefix,
> > just
> > >> > you
> > >> > > have to do all the topic rewriting, which I'm guessing will be a
> > >> little
> > >> > > annoying to implement in all of the clients and tools. However,
> the
> > >> > > tradeoff (as you mention in the KIP) is that all request schemas
> > have
> > >> to
> > >> > be
> > >> > > updated, which is also annoying.
> > >> > >
> > >> > > -Jason
> > >> > >
> > >> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <
> [email protected]
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <
> [email protected]>
> > >> > wrote:
> > >> > > >
> > >> > > > > This works really nicely from the consumer side, but what
> about
> > >> the
> > >> > > > > producer? If there are no more topics,do we allow producing
> to a
> > >> > > > directory
> > >> > > > > and have the Partitioner hash-partition messages between all
> > >> > partitions
> > >> > > > in
> > >> > > > > the multiple levels in a directory?
> > >> > > > >
> > >> > > > Good point.
> > >> > > >
> > >> > > > I am personally in favor of maintaining current behavior for
> > >> producer,
> > >> > > > i.e., letting users to only produce to a topic. This is
> different
> > >> for
> > >> > > > consumers, the suggested behavior is inline with current
> behavior.
> > >> One
> > >> > > can
> > >> > > > use regex subscription to achieve the same even today.
> > >> > > >
> > >> > > > >
> > >> > > > > Also, I think we want to preserve the consumer terminology of
> > >> > > "subscribe"
> > >> > > > > to topics / directories, but "assign" partitions - since the
> > >> consumer
> > >> > > > > behavior is different in those cases.
> > >> > > > >
> > >> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <[email protected]>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Okay this is similar to what I think we have talked about
> > >> before.
> > >> > Let
> > >> > > > me
> > >> > > > > > elaborate on the idea that I think has been floating
> > >> around--it's
> > >> > > > pretty
> > >> > > > > > similar with a few differences.
> > >> > > > > >
> > >> > > > > > I think what you are calling the "default namespace" is
> > >> basically
> > >> > > what
> > >> > > > I
> > >> > > > > > would call the "current working directory" with paths not
> > >> beginning
> > >> > > > with
> > >> > > > > > '/' being interpreted relative to this directory as in the
> fs.
> > >> > > > > >
> > >> > > > > > One thing you have to work out is what levels in this
> > hierarchy
> > >> you
> > >> > > can
> > >> > > > > > actually subscribe to. I think you are assuming only what we
> > >> > > currently
> > >> > > > > > consider a "topic", i.e. the first level of directories but
> > not
> > >> the
> > >> > > > > > partitions or parent dirs, would be subscribable. If you
> think
> > >> > about
> > >> > > > it,
> > >> > > > > > though, that constraint is a bit arbitrary.
> > >> > > > > >
> > >> > > > > > I'd propose instead the semantics that:
> > >> > > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th
> > >> partition of
> > >> > > > topic
> > >> > > > > > "c" in directory /a/b
> > >> > > > > > - Subscribing to /a/b/c means subscribing to all partitions
> in
> > >> > > > > > topic/directory "c"
> > >> > > > > > - Subscribing to /a/b means subscribing to all partitions in
> > all
> > >> > > > > > topics/subdirectories under a/b recursively
> > >> > > > > >
> > >> > > > > > Effectively the concept of topics goes away entirely--you
> just
> > >> have
> > >> > > > > > partitions/logs and directories. In this respect rather than
> > >> adding
> > >> > > new
> > >> > > > > > concepts this new feature would actually just generalizes
> what
> > >> we
> > >> > > have
> > >> > > > > > (which I think is a good thing).
> > >> > > > > >
> > >> > > > > > -Jay
> > >> > > > > >
> > >> > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh <
> > >> [email protected]
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps <
> > [email protected]>
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Great. I definitely would strongly favor carrying over
> > >> user's
> > >> > > > > intuition
> > >> > > > > > > > from FS unless we think we need a very different model.
> > The
> > >> > minor
> > >> > > > > > details
> > >> > > > > > > > like the seperator and namespace term will help with
> that.
> > >> > > > > > > >
> > >> > > > > > > > Follow-up question, say I have a layout like
> > >> > > > > > > >    /chicago-datacenter/user-events/pageviews
> > >> > > > > > > > Can I subscribe to
> > >> > > > > > > >    /chicago-datacenter/user-events
> > >> > > > > > > >
> > >> > > > > > > Yes, however they will have need a regex like
> > >> > > > > > > /chicago-datacenter/user-events/*
> > >> > > > > > >
> > >> > > > > > > > to get the full firehose of user events from chicago?
> Can
> > I
> > >> > > > subscribe
> > >> > > > > > to
> > >> > > > > > > >    /*/user-events
> > >> > > > > > > > to get user events originating from all datacenters?
> > >> > > > > > > >
> > >> > > > > > > Yes, however they will have need a regex like
> > >> > > > > > > /chicago-datacenter/user-events/*
> > >> > > > > > > Yes
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > (Assuming, for now, that these are all in the same
> > >> cluster...)
> > >> > > > > > > >
> > >> > > > > > > > Also, just to confirm, it sounds from the proposal like
> > >> config
> > >> > > > > > overrides
> > >> > > > > > > > would become fully hierarchical so you can override
> config
> > >> at
> > >> > any
> > >> > > > > > > directory
> > >> > > > > > > > point. This will add complexity in implementation but I
> > >> think
> > >> > > will
> > >> > > > > > likely
> > >> > > > > > > > be much more operator friendly.
> > >> > > > > > > >
> > >> > > > > > > Yes, that is the idea.
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > There are about a thousand details to discuss in terms
> of
> > >> how
> > >> > > this
> > >> > > > > > would
> > >> > > > > > > > impact the metadata request, various zk entries, and
> > various
> > >> > > other
> > >> > > > > > > aspects,
> > >> > > > > > > > but probably it makes sense to first agree on how we
> would
> > >> want
> > >> > > it
> > >> > > > to
> > >> > > > > > > work
> > >> > > > > > > > and then start to dive into how to implement that.
> > >> > > > > > > >
> > >> > > > > > > Agreed.
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > -Jay
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh <
> > >> > > [email protected]
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hey Jay, thanks for reviewing the proposal. Answers
> > >> inline.
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hey guys,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I think this is an important feature and one we've
> > >> talked
> > >> > > about
> > >> > > > > > for a
> > >> > > > > > > > > > while. I really think trying to invent a new
> > >> nomenclature
> > >> > is
> > >> > > > > going
> > >> > > > > > to
> > >> > > > > > > > > make
> > >> > > > > > > > > > it hard for people to understand, though. As such I
> > >> > recommend
> > >> > > > we
> > >> > > > > > call
> > >> > > > > > > > > > namespaces "directories" and denote them with
> > '/'--this
> > >> > will
> > >> > > > make
> > >> > > > > > the
> > >> > > > > > > > > > feature 1000x more understandable to people.
> > >> > > > > > > > >
> > >> > > > > > > > > Essentially you are suggesting two things here.
> > >> > > > > > > > > 1. Use "Directory" instead of "Namespace" as it is
> more
> > >> > > > intuitive.
> > >> > > > > I
> > >> > > > > > > > agree.
> > >> > > > > > > > > 2. Make '/' as delimiter instead of ':'. Fine with me
> > and
> > >> I
> > >> > > agree
> > >> > > > > if
> > >> > > > > > we
> > >> > > > > > > > > call these directories, '/' is the way to go.
> > >> > > > > > > > >
> > >> > > > > > > > > I think we should inheret the
> > >> > > > > > > > > > semantics of normal unix fs in so far as it makes
> > sense.
> > >> > > > > > > > > >
> > >> > > > > > > > > > In this approach we get rid of topics entirely,
> > instead
> > >> we
> > >> > > > really
> > >> > > > > > > just
> > >> > > > > > > > > have
> > >> > > > > > > > > > partitions which are the equivalent of a file and
> > retain
> > >> > > their
> > >> > > > > > > numeric
> > >> > > > > > > > > > names, and the existing topic concept is just the
> > first
> > >> > > > directory
> > >> > > > > > > level
> > >> > > > > > > > > but
> > >> > > > > > > > > > we generalize to allow arbitrarily many more levels
> of
> > >> > > nesting.
> > >> > > > > > This
> > >> > > > > > > > > allows
> > >> > > > > > > > > > categorization of data, such as
> > >> > > > > > /datacenter1/user-events/page-views/3
> > >> > > > > > > > and
> > >> > > > > > > > > > you can subscribe, apply configs or permissions at
> any
> > >> > level
> > >> > > of
> > >> > > > > the
> > >> > > > > > > > > > hierarchy.
> > >> > > > > > > > > >
> > >> > > > > > > > > +1. This actually requires just a minor change to
> > existing
> > >> > > > > proposal,
> > >> > > > > > > > i.e.,
> > >> > > > > > > > > "some:namespace:topic" becomes "some/namespace/topic".
> > >> > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > I'm actually not 100% such what the semantics of
> > >> accessing
> > >> > > data
> > >> > > > > in
> > >> > > > > > > > > > differing namespaces is in the current proposal,
> maybe
> > >> you
> > >> > > can
> > >> > > > > > > clarify
> > >> > > > > > > > > > Ashish?
> > >> > > > > > > > >
> > >> > > > > > > > > I will add more info to KIP on this, however I think a
> > >> client
> > >> > > > > should
> > >> > > > > > be
> > >> > > > > > > > > able to access data in any namespace as long as
> > following
> > >> > > > > conditions
> > >> > > > > > > are
> > >> > > > > > > > > satisfied.
> > >> > > > > > > > >
> > >> > > > > > > > > 1. Namespace, the client is trying to access, exists.
> > >> > > > > > > > > 2. The client has sufficient permissions on the
> > namespace
> > >> for
> > >> > > > type
> > >> > > > > of
> > >> > > > > > > > > operation the client is trying to perform on a topic
> > >> within
> > >> > > that
> > >> > > > > > > > namespace.
> > >> > > > > > > > > 3. The client has sufficient permissions on the topic
> > for
> > >> > type
> > >> > > of
> > >> > > > > > > > operation
> > >> > > > > > > > > the client is trying to perform on that topic.
> > >> > > > > > > > >
> > >> > > > > > > > > If we choose to go with what you suggested earlier
> that
> > >> just
> > >> > > have
> > >> > > > > > > > hierarchy
> > >> > > > > > > > > of directories, then step 3 will actually be covered
> in
> > >> step
> > >> > 2.
> > >> > > > > > > > >
> > >> > > > > > > > > In the current proposal, consumers will subscribe to a
> > >> topic
> > >> > > in a
> > >> > > > > > > > namespace
> > >> > > > > > > > > by specifying <namespace>:<topic> as the topic name.
> > They
> > >> can
> > >> > > > > > subscribe
> > >> > > > > > > > to
> > >> > > > > > > > > topics from multiple namespaces.
> > >> > > > > > > > >
> > >> > > > > > > > > Let me know if I totally missed your question.
> > >> > > > > > > > >
> > >> > > > > > > > > Since the point of Kafka is sharing data I think it is
> > >> really
> > >> > > > > > > > > > important that the grouping be just for
> > >> > > > > > > > > convenience/permissions/config/etc
> > >> > > > > > > > > > and that it remain possible to access multiple
> > >> > > > > > directories/namespaces
> > >> > > > > > > > > from
> > >> > > > > > > > > > the same client.
> > >> > > > > > > > > >
> > >> > > > > > > > > Totally agree with you.
> > >> > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > -Jay
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish Singh <
> > >> > > > > [email protected]>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hey Guys,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I just created KIP-37 for adding namespaces to
> > Kafka.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > KIP-37
> > >> > > > > > > > > > > <
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > tracks the proposal.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The idea is to make Kafka support multi-tenancy
> via
> > >> > > > namespaces.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Feedback and comments are welcome.
> > >> > > > > > > > > > > 
> > >> > > > > > > > > > > --
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Regards,
> > >> > > > > > > > > > > Ashish
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > >
> > >> > > > > > > > > Regards,
> > >> > > > > > > > > Ashish
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Regards,
> > >> > > > > > > Ashish
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Regards,
> > >> > > > Ashish
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashish
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish
> >
>



-- 

Regards,
Ashish

Re: [DISCUSS] KIP-37 - Add namespaces in Kafka

Reply via email to