Hey Guys, One stub ticket for this is here:
https://issues.apache.org/jira/browse/SAMZA-40 It's been around a while, but hasn't had any work done on it. Perhaps we should discuss and see if we can get some consensus on what needs to be done. At its core, I think Samza's config currently involves several things: 1. Defining config for a job. 2. Passing this config from run-job.sh to the AM and SamzaContainer. 3. Exposing configs programmatically to the end user (via the Config object). 4. Wiring classes together. Stuff we could also look at are: * Hierarchical configs. * Wiring frameworks. * DSLs. * What we like from other frameworks. Depending on how far we want to go, this change could be transparent to end users, or be a very big breaking change. If it's a big breaking change, we could discuss whether we want to wait for a major revision (a post-1.0, 2.X release) rather than doing it in a (0.X) release. At this point, I'm mostly just interested in thinking through, and coming up with, the direction that config should go in. Cheers, Chris On 9/8/14 3:43 PM, "Yan Fang" <[email protected]> wrote: >yes, actually when I am using Samza, I add this config in the profile to >make my life easier. It is also true that Samza's config system is already >very complicated :( -- I feel the same way... It is now actually very >convenient for experienced users but may scare new users away... > >Fang, Yan >[email protected] >+1 (206) 849-4108 > >On Mon, Sep 8, 2014 at 3:27 PM, Chris Riccomini < >[email protected]> wrote: > >> Hey Roger, >> >> Funnily enough, we actually used to have this feature in Samza 0.6.0, >> before it was open sourced. We called them "logical streams". The main >> reason that we removed them was really about usability. Samza's configs >> are already overly complicated (at least, I feel that way), and adding >>an >> extra level of indirection was leading to a lot of confusion from >> developers. >> >> You can still proxy this by using job-specific config, and a variable: >> >> val edits = config.getString("edits") >> >> And then define: >> >> edits=kafka.edit-stream-name >> >> In conifig. That seems a bit clunky, though. >> >> Taking a step back, I think Samza's config system is due for a revisit. >> David Chen was actually just discussing this with me today, and I >>expect a >> JIRA on it to pop up sometime soon. If we simplified things, it might be >> possible to add this feature back in. >> >> Cheers, >> Chris >> >> On 9/8/14 2:51 PM, "Roger Hoover" <[email protected]> wrote: >> >> >"It might be a huge deal." I mean "it might not* be huge deal. >> > >> >On Mon, Sep 8, 2014 at 2:50 PM, Roger Hoover <[email protected]> >> >wrote: >> > >> >> Hi, >> >> >> >> Wondering if people with more experience with Samza think it would >>be a >> >> good idea to keep topic names out of the code. You might want to be >> >>able >> >> to change topics by editing the config instead of having to recompile >> >>the >> >> job. >> >> >> >> Maybe introduce an indirection so that output streams have names? >> >> >> >> Config: >> >> #Define an input named "raw" which maps to Kafka topic >>"wikipedia-raw" >> >> task.inputs.kafka.raw=wikipedia-raw >> >> #Use raw as an input >> >> task.inputs=kafka.raw >> >> #Define an output named "edits" which maps to Kafka topic >> >>"wikipedia-edits" >> >> task.outputs.kafka.edits=wikipedia-edits >> >> >> >> Task code: >> >> >> >> //Input stream would be called "raw" here instead of "wikipedia-raw" >> >> String stream = >> >> envelope.getSystemStreamPartition().getSystemStream().getStream(); >> >> if (stream.equals("raw") { >> >> processRawMsg(envelope, collector, coordinator); >> >> } >> >> >> >> //Send messages to locally named topic "edits" >> >> collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", >> >> "edits"), parsedJsonObject)); >> >> >> >> Thoughts? It might be a huge deal. I just found myself copy and >> >>pasting >> >> names a lot across config and code files while writing some test >>jobs. >> >> >> >> Cheers, >> >> >> >> Roger >> >> >> >> Roger >> >> >> >>
