Unintentional side effect from allowing IP addresses in consumer client IDs :)

So the question is, what do we do now?

1) disallow "."
2) disallow "_"
3) find a reversible way to encode "." and "_" that won't break existing metrics
4) all of the above?

btw. it looks like "." and ".." are currently valid. Topic names are
used for directories, right? this sounds like fun :)

I vote for option #1, although if someone has a good idea for #3 it
will be even better.

Gwen



On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <ghe...@cloudera.com> wrote:
> Found it was added here: https://issues.apache.org/jira/browse/KAFKA-697
>
> On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <tpal...@gmail.com> wrote:
>
>> This was definitely changed at some point after KAFKA-495. The question is
>> when and why.
>>
>> Here's the relevant code from that patch:
>>
>> ===================================================================
>> --- core/src/main/scala/kafka/utils/Topic.scala (revision 1390178)
>> +++ core/src/main/scala/kafka/utils/Topic.scala (working copy)
>> @@ -21,24 +21,21 @@
>>  import util.matching.Regex
>>
>>  object Topic {
>> +  val legalChars = "[a-zA-Z0-9_-]"
>>
>>
>>
>> -Todd
>>
>>
>> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <ghe...@cloudera.com> wrote:
>>
>> > kafka.common.Topic shows that currently period is a valid character and I
>> > have verified I can use kafka-topics.sh to create a new topic with a
>> > period.
>> >
>> >
>> > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK currently uses
>> > Topic.validate before writing to Zookeeper.
>> >
>> > Should period character support be removed? I was under the same
>> impression
>> > as Gwen, that a period was used by many as a way to "group" topics.
>> >
>> > The code is pasted below since its small:
>> >
>> > object Topic {
>> >   val legalChars = "[a-zA-Z0-9\\._\\-]"
>> >   private val maxNameLength = 255
>> >   private val rgx = new Regex(legalChars + "+")
>> >
>> >   val InternalTopics = Set(OffsetManager.OffsetsTopicName)
>> >
>> >   def validate(topic: String) {
>> >     if (topic.length <= 0)
>> >       throw new InvalidTopicException("topic name is illegal, can't be
>> > empty")
>> >     else if (topic.equals(".") || topic.equals(".."))
>> >       throw new InvalidTopicException("topic name cannot be \".\" or
>> > \"..\"")
>> >     else if (topic.length > maxNameLength)
>> >       throw new InvalidTopicException("topic name is illegal, can't be
>> > longer than " + maxNameLength + " characters")
>> >
>> >     rgx.findFirstIn(topic) match {
>> >       case Some(t) =>
>> >         if (!t.equals(topic))
>> >           throw new InvalidTopicException("topic name " + topic + " is
>> > illegal, contains a character other than ASCII alphanumerics, '.', '_'
>> and
>> > '-'")
>> >       case None => throw new InvalidTopicException("topic name " + topic
>> +
>> > " is illegal,  contains a character other than ASCII alphanumerics, '.',
>> > '_' and '-'")
>> >     }
>> >   }
>> > }
>> >
>> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <tpal...@gmail.com> wrote:
>> >
>> > > I had to go look this one up again to make sure -
>> > > https://issues.apache.org/jira/browse/KAFKA-495
>> > >
>> > > The only valid character names for topics are alphanumeric, underscore,
>> > and
>> > > dash. A period is not supposed to be a valid character to use. If
>> you're
>> > > seeing them, then one of two things have happened:
>> > >
>> > > 1) You have topic names that are grandfathered in from before that
>> patch
>> > > 2) The patch is not working properly and there is somewhere in the
>> broker
>> > > that the standard is not being enforced.
>> > >
>> > > -Todd
>> > >
>> > >
>> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <br...@apache.org>
>> wrote:
>> > >
>> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
>> gshap...@cloudera.com>
>> > > > wrote:
>> > > > > Hi Kafka Fans,
>> > > > >
>> > > > > If you have one topic named "kafka_lab_2" and the other named
>> > > > > "kafka.lab.2", the topic level metrics will be named kafka_lab_2
>> for
>> > > > > both, effectively making it impossible to monitor them properly.
>> > > > >
>> > > > > The reason this happens is that using "." in topic names is pretty
>> > > > > common, especially as a way to group topics into data centers,
>> > > > > relevant apps, etc - basically a work-around to our current lack of
>> > > > > name spaces. However, most metric monitoring systems using "." to
>> > > > > annotate hierarchy, so to avoid issues around metric names, Kafka
>> > > > > replaces the "." in the name with an underscore.
>> > > > >
>> > > > > This generates good metric names, but creates the problem with name
>> > > > collisions.
>> > > > >
>> > > > > I'm wondering if it makes sense to simply limit the range of
>> > > > > characters permitted in a topic name and disallow "_"? Obviously
>> > > > > existing topics will need to remain as is, which is a bit awkward.
>> > > >
>> > > > Interesting problem! Many if not most users I personally am aware of
>> > > > use "_" as a separator in topic names. I am sure that many users
>> would
>> > > > be quite surprised by this limitation. With that said, I am sure
>> > > > they'd transition accordingly.
>> > > >
>> > > > >
>> > > > > If anyone has better backward-compatible solutions to this, I'm all
>> > > ears
>> > > > :)
>> > > > >
>> > > > > Gwen
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Grant Henke
>> > Solutions Consultant | Cloudera
>> > ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>> >
>>
>
>
>
> --
> Grant Henke
> Solutions Consultant | Cloudera
> ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Reply via email to