[
https://issues.apache.org/jira/browse/SAMZA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820718#comment-13820718
]
Jakob Homan commented on SAMZA-82:
----------------------------------
Passing something like SAMZA_STREAM_PARTITIONS seems like the best approach.
I'd rather we knew as soon as possible what the actual topic-partitions we're
dealing with are, rather than having a huge set of potential topic-partition
pairs floating throw the code paths. The sooner we determine the actual work
to be done, the better, particularly as we do more work in the job assignment
phase.
I was concerned about how large the the SAMZA_STREAM_PARTITIONS env variable
would be for jobs with large numbers of topics and/or partitions, but there
doesn't seem to actually be a [practical limit on their
size|http://stackoverflow.com/questions/1078031/what-is-the-maximum-size-of-an-environment-variable-value].
Just the same, it may best to do some type of RLE on the variable, ie
{noformat}SAMZA_STREAM_PARTITIONS=foo.bar:0,2,foo.baz:0{noformat}
or
{noformat}SAMZA_STREAM_PARTITONS=foo.(bar:0,2)(.baz:0){noformat}
> Not use maximum number of partitions when initializing streams
> --------------------------------------------------------------
>
> Key: SAMZA-82
> URL: https://issues.apache.org/jira/browse/SAMZA-82
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.7.0
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Fix For: 0.7.0
>
>
> Util.scala:
> {code} /**
> * Uses config to create SystemAdmin classes for all input stream systems to
> * get each input stream's partition count, then returns the maximum count.
> * An input stream with two partitions, and a second input stream with four
> * partitions would result in this method returning 4.
> */
> def getMaxInputStreamPartitions(config: Config) = {
> {code}
> This approach works if all the streams have the same number of partitions,
> but is inefficient for other cases and, where the underlying system gets
> cranky about being asked about non-existing partitions, fails. We should
> eagerly figure out the correct number of partitions for each topic and pass
> that information from there.
--
This message was sent by Atlassian JIRA
(v6.1#6144)