One way to get around this conflict could be to replace . with _ and _ with __

On Sat, Jul 11, 2015 at 10:33 AM, Todd Palino <tpal...@gmail.com> wrote:
> I tend to agree with this as a compromise at this point. The reality is that 
> this is technical debt that has built up in the project, and it does not go 
> away by documenting it, and it will only get worse.
>
> As pointed out, eliminating either character at this point is going to cause 
> problems for someone. And unfortunately, Guozhang, converting to __ doesn't 
> really solve the problem either because that is still a valid topic name that 
> could collide. It's less likely, but all it does is move the debt around a 
> little.
>
> -Todd
>
>> On Jul 11, 2015, at 10:16 AM, Brock Noland <br...@apache.org> wrote:
>>
>> On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava
>> <e...@confluent.io> wrote:
>>> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
>>>
>>>> Yeah, I have an actual customer who ran into this. Unfortunately,
>>>> inconsistencies in the way things are named are pretty common - just
>>>> look at Kafka's many CLI options.
>>>>
>>>> I don't think that supporting both and pointing at the docs with "I
>>>> told you so" when our metrics break is a good solution.
>>>
>>> I agree, especially since we don't *already* have something in the docs
>>> indicating this will be an issue. I was flippant about the situation
>>> because I *wish* there was more careful consideration + naming policy in
>>> place, but I realize that doesn't always happen in practice. I guess I need
>>> to take Compatibility Czar more seriously :)
>>>
>>> I see think the obvious practical options are as follows:
>>>
>>> 1. Kill support for "_". Piss off the entire set of people who currently
>>> use "_" anywhere in topic names.
>>> 2. Kill support for ".". Piss off the entire set of people who currently
>>> use "." anywhere in topic names.
>>> 3. Tell people they need to be careful about this issue. Piss off the set
>>> of people who use both "_" and "." *and* happen to have conflicting topic
>>> names. They will have some pain when they discover the issue and have to
>>> figure out how to move one of those topics over to a non-conflicting name.
>>> I'm going to claim that this group must be an *extremely* small fraction of
>>> users, which doesn't make it better to allow things to break for them, but
>>> at least gives us an idea of the scale of impact.
>>>
>>> (One other alternative suggested earlier was encoding metric names to
>>> account for differences; given the metric renaming mess in the last
>>> release, I'm extremely hesitant to suggest anything of the sort...)
>>>
>>> None of the options are ideal, but to me, 3 seems like the least painful.
>>> Both for us, and for the vast majority of users. It seems to me that the
>>> number of users that would complain about (1) or (2) drastically outweigh
>>> (3).
>>>
>>> At this point, I don't think it's practical to keep switching the rules
>>> about which characters are allowed and which aren't because the previous
>>> attempts haven't been successful -- it seems the rules have changed
>>> multiple times, whether intentionally or accidentally, such that any more
>>> changes will cause problems. At this point, I think we just need to accept
>>> being liberal in accepting the range of topic names that have been
>>> permitted so far and make the best of the situation, even if it means only
>>> being able to warn people of conflicts.
>>>
>>> Here's another alternative: how about being liberal with topic name
>>> characters, but upon topic creation we convert the name to the metric name
>>> and fail if there's a conflict with another topic? This is relatively
>>> expensive (requires getting the metric name of all other topics), but it
>>> avoids the bad situation we're encountering here (conflicting metrics),
>>> avoids getting into a persistent conflict (we kill topic creation when we
>>> detect the issue rather than noticing it when the metrics conflict
>>> happens), and keeps the vast majority of existing users happy (both _ and .
>>> work in topic names as long as you don't create topics with conflicting
>>> metric names).
>>>
>>> There are definitely details to be worked out (auto topic creation?), but
>>> it seems like a more realistic solution than to start disallowing _ or . in
>>> topic names.
>>
>> I was thinking the same. Allow a.b or a_b but not a.b and a_b. This
>> seems like it will impact a trivial amount of users and keep both the
>> "." and "_" camps happy.
>>
>>>
>>> -Ewen
>>>
>>>
>>>>
>>>> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
>>>> <e...@confluent.io> wrote:
>>>>> I figure you'll probably see complaints no matter what change you make.
>>>>> Gwen, given that you raised this, another important question might be how
>>>>> many people you see using *both*. I'm guessing this question came up
>>>>> because you actually saw a conflict? But I'd imagine (or at least hope)
>>>>> that most organizations are mostly consistent about naming topics -- they
>>>>> standardize on one or the other.
>>>>>
>>>>> Since there's no "right" way to name them, I'd just leave it supporting
>>>>> both and document the potential conflict in metrics. And if people use
>>>> both
>>>>> naming schemes, they probably deserve to suffer for their inconsistency
>>>> :)
>>>>>
>>>>> -Ewen
>>>>>
>>>>>> On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> I find dots more common in my customer base, so I will definitely feel
>>>>>> the pain of removing them.
>>>>>>
>>>>>> However, "." are already used in metrics, file names, directories, etc
>>>>>> - so if we keep the dots, we need to keep code that translates them
>>>>>> and document the translation. Just banning "." seems more natural.
>>>>>> Also, as Grant mentioned, we'll probably have our own special usage
>>>>>> for "." down the line.
>>>>>>
>>>>>>> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> wrote:
>>>>>>> I absolutely disagree with #2, Neha. That will break a lot of
>>>>>>> infrastructure within LinkedIn. That said, removing "." might break
>>>> other
>>>>>>> people as well, but I think we should have a clearer idea of how much
>>>>>> usage
>>>>>>> there is on either side.
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>>
>>>>>>>> On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> "." seems natural for grouping topic names. +1 for 2) going forward
>>>> only
>>>>>>>> without breaking previously created topics with "_" though that might
>>>>>>>> require us to patch the code somewhat awkwardly till we phase it out
>>>> a
>>>>>>>> couple (purposely left vague to stay out of Ewen's wrath :-))
>>>> versions
>>>>>>>> later.
>>>>>>>>
>>>>>>>> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <gshap...@cloudera.com
>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I don't think we should break existing topics. Just disallow new
>>>>>>>>> topics going forward.
>>>>>>>>>
>>>>>>>>> Agree that having both is horrible, but we should have a solution
>>>> that
>>>>>>>>> fails when you run "kafka_topics.sh --create", not when you
>>>> configure
>>>>>>>>> Ganglia.
>>>>>>>>>
>>>>>>>>> Gwen
>>>>>>>>>
>>>>>>>>> On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io>
>>>> wrote:
>>>>>>>>>> Unfortunately '.' is pretty common too. I agree that it is
>>>> perverse,
>>>>>>>> but
>>>>>>>>>> people seem to do it. Breaking all the topics with '.' in the
>>>> name
>>>>>>>> seems
>>>>>>>>>> like it could be worse than combining metrics for people who
>>>> have a
>>>>>>>>>> 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY
>>>>>> perverse,
>>>>>>>>>> no?).
>>>>>>>>>>
>>>>>>>>>> Where is our Dean of Compatibility, Ewen, on this?
>>>>>>>>>>
>>>>>>>>>> -Jay
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> My selfish point of view is that we do #1, as we use "_"
>>>>>> extensively
>>>>>>>> in
>>>>>>>>>>> topic names here :) I also happen to think it's the right
>>>> choice,
>>>>>>>>>>> specifically because "." has more special meanings, as you
>>>> noted.
>>>>>>>>>>>
>>>>>>>>>>> -Todd
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <
>>>>>> gshap...@cloudera.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unintentional side effect from allowing IP addresses in
>>>> consumer
>>>>>>>>> client
>>>>>>>>>>>> IDs :)
>>>>>>>>>>>>
>>>>>>>>>>>> So the question is, what do we do now?
>>>>>>>>>>>>
>>>>>>>>>>>> 1) disallow "."
>>>>>>>>>>>> 2) disallow "_"
>>>>>>>>>>>> 3) find a reversible way to encode "." and "_" that won't
>>>> break
>>>>>>>>> existing
>>>>>>>>>>>> metrics
>>>>>>>>>>>> 4) all of the above?
>>>>>>>>>>>>
>>>>>>>>>>>> btw. it looks like "." and ".." are currently valid. Topic
>>>> names
>>>>>> are
>>>>>>>>>>>> used for directories, right? this sounds like fun :)
>>>>>>>>>>>>
>>>>>>>>>>>> I vote for option #1, although if someone has a good idea for
>>>> #3
>>>>>> it
>>>>>>>>>>>> will be even better.
>>>>>>>>>>>>
>>>>>>>>>>>> Gwen
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <
>>>>>> ghe...@cloudera.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Found it was added here:
>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-697
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <
>>>>>> tpal...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This was definitely changed at some point after KAFKA-495.
>>>> The
>>>>>>>>>>> question
>>>>>>>>>>>> is
>>>>>>>>>>>>>> when and why.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the relevant code from that patch:
>>>>>>>> ===================================================================
>>>>>>>>>>>>>> --- core/src/main/scala/kafka/utils/Topic.scala (revision
>>>>>>>> 1390178)
>>>>>>>>>>>>>> +++ core/src/main/scala/kafka/utils/Topic.scala (working
>>>> copy)
>>>>>>>>>>>>>> @@ -21,24 +21,21 @@
>>>>>>>>>>>>>> import util.matching.Regex
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> object Topic {
>>>>>>>>>>>>>> +  val legalChars = "[a-zA-Z0-9_-]"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Todd
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <
>>>>>>>> ghe...@cloudera.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kafka.common.Topic shows that currently period is a valid
>>>>>>>>> character
>>>>>>>>>>>> and I
>>>>>>>>>>>>>>> have verified I can use kafka-topics.sh to create a new
>>>>>> topic
>>>>>>>>> with a
>>>>>>>>>>>>>>> period.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK
>>>>>>>>> currently
>>>>>>>>>>>> uses
>>>>>>>>>>>>>>> Topic.validate before writing to Zookeeper.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should period character support be removed? I was under
>>>> the
>>>>>>>> same
>>>>>>>>>>>>>> impression
>>>>>>>>>>>>>>> as Gwen, that a period was used by many as a way to
>>>> "group"
>>>>>>>>> topics.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The code is pasted below since its small:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> object Topic {
>>>>>>>>>>>>>>>  val legalChars = "[a-zA-Z0-9\\._\\-]"
>>>>>>>>>>>>>>>  private val maxNameLength = 255
>>>>>>>>>>>>>>>  private val rgx = new Regex(legalChars + "+")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  val InternalTopics =
>>>> Set(OffsetManager.OffsetsTopicName)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  def validate(topic: String) {
>>>>>>>>>>>>>>>    if (topic.length <= 0)
>>>>>>>>>>>>>>>      throw new InvalidTopicException("topic name is
>>>>>> illegal,
>>>>>>>>> can't
>>>>>>>>>>> be
>>>>>>>>>>>>>>> empty")
>>>>>>>>>>>>>>>    else if (topic.equals(".") || topic.equals(".."))
>>>>>>>>>>>>>>>      throw new InvalidTopicException("topic name cannot
>>>> be
>>>>>>>>> \".\" or
>>>>>>>>>>>>>>> \"..\"")
>>>>>>>>>>>>>>>    else if (topic.length > maxNameLength)
>>>>>>>>>>>>>>>      throw new InvalidTopicException("topic name is
>>>>>> illegal,
>>>>>>>>> can't
>>>>>>>>>>> be
>>>>>>>>>>>>>>> longer than " + maxNameLength + " characters")
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    rgx.findFirstIn(topic) match {
>>>>>>>>>>>>>>>      case Some(t) =>
>>>>>>>>>>>>>>>        if (!t.equals(topic))
>>>>>>>>>>>>>>>          throw new InvalidTopicException("topic name " +
>>>>>> topic
>>>>>>>>> + "
>>>>>>>>>>> is
>>>>>>>>>>>>>>> illegal, contains a character other than ASCII
>>>>>> alphanumerics,
>>>>>>>>> '.',
>>>>>>>>>>> '_'
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> '-'")
>>>>>>>>>>>>>>>      case None => throw new InvalidTopicException("topic
>>>>>> name
>>>>>>>> "
>>>>>>>>> +
>>>>>>>>>>>> topic
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> " is illegal,  contains a character other than ASCII
>>>>>>>>> alphanumerics,
>>>>>>>>>>>> '.',
>>>>>>>>>>>>>>> '_' and '-'")
>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <
>>>>>>>> tpal...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had to go look this one up again to make sure -
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-495
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The only valid character names for topics are
>>>>>> alphanumeric,
>>>>>>>>>>>> underscore,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> dash. A period is not supposed to be a valid character
>>>> to
>>>>>>>> use.
>>>>>>>>> If
>>>>>>>>>>>>>> you're
>>>>>>>>>>>>>>>> seeing them, then one of two things have happened:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) You have topic names that are grandfathered in from
>>>>>> before
>>>>>>>>> that
>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>> 2) The patch is not working properly and there is
>>>>>> somewhere
>>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>>>> that the standard is not being enforced.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Todd
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <
>>>>>>>>> br...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
>>>>>>>>>>>>>> gshap...@cloudera.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi Kafka Fans,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you have one topic named "kafka_lab_2" and the
>>>>>> other
>>>>>>>>> named
>>>>>>>>>>>>>>>>>> "kafka.lab.2", the topic level metrics will be
>>>> named
>>>>>>>>>>> kafka_lab_2
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> both, effectively making it impossible to monitor
>>>> them
>>>>>>>>>>> properly.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The reason this happens is that using "." in topic
>>>>>> names
>>>>>>>> is
>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>>>>> common, especially as a way to group topics into
>>>> data
>>>>>>>>> centers,
>>>>>>>>>>>>>>>>>> relevant apps, etc - basically a work-around to our
>>>>>>>> current
>>>>>>>>>>>> lack of
>>>>>>>>>>>>>>>>>> name spaces. However, most metric monitoring
>>>> systems
>>>>>>>> using
>>>>>>>>> "."
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> annotate hierarchy, so to avoid issues around
>>>> metric
>>>>>>>> names,
>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>> replaces the "." in the name with an underscore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This generates good metric names, but creates the
>>>>>> problem
>>>>>>>>> with
>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>> collisions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm wondering if it makes sense to simply limit the
>>>>>> range
>>>>>>>>> of
>>>>>>>>>>>>>>>>>> characters permitted in a topic name and disallow
>>>> "_"?
>>>>>>>>>>> Obviously
>>>>>>>>>>>>>>>>>> existing topics will need to remain as is, which
>>>> is a
>>>>>> bit
>>>>>>>>>>>> awkward.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Interesting problem! Many if not most users I
>>>>>> personally am
>>>>>>>>>>> aware
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> use "_" as a separator in topic names. I am sure that
>>>>>> many
>>>>>>>>> users
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> be quite surprised by this limitation. With that
>>>> said,
>>>>>> I am
>>>>>>>>> sure
>>>>>>>>>>>>>>>>> they'd transition accordingly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If anyone has better backward-compatible solutions
>>>> to
>>>>>>>> this,
>>>>>>>>>>> I'm
>>>>>>>>>>>> all
>>>>>>>>>>>>>>>> ears
>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Gwen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Grant Henke
>>>>>>>>>>>>>>> Solutions Consultant | Cloudera
>>>>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke |
>>>>>>>>>>>> linkedin.com/in/granthenke
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Grant Henke
>>>>>>>>>>>>> Solutions Consultant | Cloudera
>>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke |
>>>>>>>>> linkedin.com/in/granthenke
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Neha
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Ewen
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Ewen

Reply via email to