[
https://issues.apache.org/jira/browse/SAMZA-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Navina Ramesh updated SAMZA-971:
--------------------------------
Description:
JobCoordinator / JobModelManager does not need to fetch offset for all stream
partitions. It only needs the partition count for each stream in order
distribute them among tasks.
The impact of fetching offsets is that when many topic partitions are being
consumed, it takes longer for the Samza job to boot-up. If the yarn-am-liveness
timeout is set to be lower than the time for the AM to boot up, then the RM
kills the application. Such a job may never be able to start-up.
The main problem here is the generic interface in SystemAdmin -
getSystemStreamMetadata for fetching partition count AND offset information. If
we have separate interfaces for fetching each of these information, it will
provide more granular control on fetching only required information. A similar
approach was used in SAMZA-882 to detect the partition count changes in the
input streams.
> JobCoordinator/JobModelManager does not need to fetch offset for all stream
> partitions
> --------------------------------------------------------------------------------------
>
> Key: SAMZA-971
> URL: https://issues.apache.org/jira/browse/SAMZA-971
> Project: Samza
> Issue Type: Bug
> Reporter: Navina Ramesh
> Assignee: Navina Ramesh
> Labels: newbie
> Fix For: 0.10.1
>
>
> JobCoordinator / JobModelManager does not need to fetch offset for all stream
> partitions. It only needs the partition count for each stream in order
> distribute them among tasks.
> The impact of fetching offsets is that when many topic partitions are being
> consumed, it takes longer for the Samza job to boot-up. If the
> yarn-am-liveness timeout is set to be lower than the time for the AM to boot
> up, then the RM kills the application. Such a job may never be able to
> start-up.
> The main problem here is the generic interface in SystemAdmin -
> getSystemStreamMetadata for fetching partition count AND offset information.
> If we have separate interfaces for fetching each of these information, it
> will provide more granular control on fetching only required information. A
> similar approach was used in SAMZA-882 to detect the partition count changes
> in the input streams.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)