[
https://issues.apache.org/jira/browse/SAMZA-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348943#comment-15348943
]
Yi Pan (Data Infrastructure) commented on SAMZA-971:
----------------------------------------------------
+1 on SAMZA-971-2.patch.
> JobCoordinator/JobModelManager does not need to fetch offset for all stream
> partitions
> --------------------------------------------------------------------------------------
>
> Key: SAMZA-971
> URL: https://issues.apache.org/jira/browse/SAMZA-971
> Project: Samza
> Issue Type: Bug
> Reporter: Navina Ramesh
> Assignee: Navina Ramesh
> Labels: newbie
> Fix For: 0.10.1
>
> Attachments: SAMZA-971-0.patch, SAMZA-971-1.patch, SAMZA-971-2.patch
>
>
> JobCoordinator / JobModelManager does not need to fetch offset for all stream
> partitions. It only needs the partition count for each stream in order
> distribute them among tasks.
> The impact of fetching offsets is that when many topic partitions are being
> consumed, it takes longer for the Samza job to boot-up. If the
> yarn-am-liveness timeout is set to be lower than the time for the AM to boot
> up, then the RM kills the application. Such a job may never be able to
> start-up.
> The main problem here is the generic interface in SystemAdmin -
> getSystemStreamMetadata for fetching partition count AND offset information.
> If we have separate interfaces for fetching each of these information, it
> will provide more granular control on fetching only required information. A
> similar approach was used in SAMZA-882 to detect the partition count changes
> in the input streams.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)