[
https://issues.apache.org/jira/browse/SAMZA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167459#comment-14167459
]
Chris Riccomini commented on SAMZA-40:
--------------------------------------
Samza's deployment model very much models Hadoop's--it only knows about jobs.
It is up to other tooling built on top of Samza to provide topology
abstractions (just like Oozie, Azkaban, etc). This assumption has been baked in
from the beginning. The reasoning for not wanting topologies is that they don't
model how things are really working.
# In theory, a bunch of jobs are wired together in a topology, and they all
know about each other. In practice, we're talking about multi-subscriber
streams that connect the jobs. Anyone may consume or produce to these streams
(including non-samza jobs). So, even if you have a topology defined, it doesn't
always reflect reality.
# Many topologies have jobs owned by different developers, or teams. This is
problematic, as it forces a shared code base (and usually a shared deployment
schedule), which might not be desirable.
# Topologies tend to force a deployment model where multiple jobs are deployed
at once, which is not desirable.
Anecdotally, I've spoken to more than one person who's used another stream
processing framework that uses topologies, and they've ended up just writing
one job per-topology, to circumvent the problems that I defined above.
As far as preventing/catching mismatches in errors/partitioning, I think this
is one of the things that a layer on top of Samza should provide (e.g. a SQL
layer). There is probably also some opportunity to address this within a single
job's config (e.g. defining a join job, and validating that all partitions for
all input streams match), but I haven't thought much about that part of it.
> Refactor Samza configuration
> ----------------------------
>
> Key: SAMZA-40
> URL: https://issues.apache.org/jira/browse/SAMZA-40
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Labels: project
>
> Samza's configuration system has several problems that we need to resolved.
> * Want to auto-generate documentation based off of configuration.
> * Should support global defaults for a config property. Right now, we do
> config.getFoo.getOrElse() everywhere.
> * Should validate config up front, rather than thrown runtime exceptions
> randomly throughout the code.
> * We are mixing wiring and configuration together. How do other systems
> handle this?
> * We have fragmented configuration (anybody can define configuration). How do
> other systems handle this?
> * How to handle undefined configuration? How to make this interoperable with
> both Java and Scala (i.e. should we support Option in Scala)?
> * Should remain immutable.
> * Should remove implicits. It's just confusing.
> * Do we want to support complex types (list, map) for values, not just String?
> We need a design proposal for this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)