Re: Keeping topic names out of the code?

Chris Riccomini Mon, 08 Sep 2014 15:55:15 -0700

Hey Guys,

One stub ticket for this is here:


  https://issues.apache.org/jira/browse/SAMZA-40

It's been around a while, but hasn't had any work done on it. Perhaps we
should discuss and see if we can get some consensus on what needs to be
done.

At its core, I think Samza's config currently involves several things:

1. Defining config for a job.
2. Passing this config from run-job.sh to the AM and SamzaContainer.
3. Exposing configs programmatically to the end user (via the Config
object).
4. Wiring classes together.

Stuff we could also look at are:

* Hierarchical configs.
* Wiring frameworks.
* DSLs.
* What we like from other frameworks.

Depending on how far we want to go, this change could be transparent to
end users, or be a very big breaking change. If it's a big breaking
change, we could discuss whether we want to wait for a major revision (a
post-1.0, 2.X release) rather than doing it in a (0.X) release. At this
point, I'm mostly just interested in thinking through, and coming up with,
the direction that config should go in.

Cheers,
Chris

On 9/8/14 3:43 PM, "Yan Fang" <[email protected]> wrote:

>yes, actually when I am using Samza, I add this config in the profile to
>make my life easier. It is also true that Samza's config system is already
>very complicated :( -- I feel the same way... It is now actually very
>convenient for experienced users but may scare new users away...
>
>Fang, Yan
>[email protected]
>+1 (206) 849-4108
>
>On Mon, Sep 8, 2014 at 3:27 PM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Roger,
>>
>> Funnily enough, we actually used to have this feature in Samza 0.6.0,
>> before it was open sourced. We called them "logical streams". The main
>> reason that we removed them was really about usability. Samza's configs
>> are already overly complicated (at least, I feel that way), and adding
>>an
>> extra level of indirection was leading to a lot of confusion from
>> developers.
>>
>> You can still proxy this by using job-specific config, and a variable:
>>
>>   val edits = config.getString("edits")
>>
>> And then define:
>>
>>   edits=kafka.edit-stream-name
>>
>> In conifig. That seems a bit clunky, though.
>>
>> Taking a step back, I think Samza's config system is due for a revisit.
>> David Chen was actually just discussing this with me today, and I
>>expect a
>> JIRA on it to pop up sometime soon. If we simplified things, it might be
>> possible to add this feature back in.
>>
>> Cheers,
>> Chris
>>
>> On 9/8/14 2:51 PM, "Roger Hoover" <[email protected]> wrote:
>>
>> >"It might be a huge deal."  I mean "it might not* be huge deal.
>> >
>> >On Mon, Sep 8, 2014 at 2:50 PM, Roger Hoover <[email protected]>
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> Wondering if people with more experience with Samza think it would
>>be a
>> >> good idea to keep topic names out of the code.  You might want to be
>> >>able
>> >> to change topics by editing the config instead of having to recompile
>> >>the
>> >> job.
>> >>
>> >> Maybe introduce an indirection so that output streams have names?
>> >>
>> >> Config:
>> >> #Define an input named "raw" which maps to Kafka topic
>>"wikipedia-raw"
>> >> task.inputs.kafka.raw=wikipedia-raw
>> >> #Use raw as an input
>> >> task.inputs=kafka.raw
>> >> #Define an output named "edits" which maps to Kafka topic
>> >>"wikipedia-edits"
>> >> task.outputs.kafka.edits=wikipedia-edits
>> >>
>> >> Task code:
>> >>
>> >> //Input stream would be called "raw" here instead of "wikipedia-raw"
>> >> String stream =
>> >> envelope.getSystemStreamPartition().getSystemStream().getStream();
>> >> if (stream.equals("raw") {
>> >>   processRawMsg(envelope, collector, coordinator);
>> >> }
>> >>
>> >> //Send messages to locally named topic "edits"
>> >> collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka",
>> >> "edits"), parsedJsonObject));
>> >>
>> >> Thoughts?  It might be a huge deal.  I just found myself copy and
>> >>pasting
>> >> names a lot across config and code files while writing some test
>>jobs.
>> >>
>> >> Cheers,
>> >>
>> >> Roger
>> >>
>> >> Roger
>> >>
>>
>>

Re: Keeping topic names out of the code?

Reply via email to