[ 
https://issues.apache.org/jira/browse/SAMZA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-2161:
-------------------------------------------
    Description: 
Currently the metadata of a samza job is stored into a kafka topic named 
coordinator stream.

In samza-yarn ApplicationMaster, the same coordinator stream is read twice as a 
part of the startup sequence. This duplicate read unnecessarily prolongs the 
startup time of the application master and makes the container allocation take 
longer than usual. This inadvertently incurs a substantial increase in input 
stream processing delay depending upon the size of the coordinator stream. 

To mitigate this problem, the two util classes in samza viz 
`ChangelogPartitionManager`, `Config` should be moved to use the MetadataStore 
abstraction. This ticket tracks the work involved in the migration. 

  was:
Currently the metadata of a samza job is stored into a kafka topic named 
coordinator stream.

In samza-yarn ApplicationMaster, the same coordinator stream is read twice as a 
part of the startup sequence. This duplicate read unnecessarily prolongs the 
startup time of the application master and makes the container allocation take 
longer than usual. This inadvertently incurs a substantial increase in input 
stream processing delay depending upon the size of the coordinator stream.

 

To mitigate this problem, the two util classes in samza viz 
`ChangelogPartitionManager`, `Config` should be moved to use the MetadataStore 
abstraction. This ticket tracks the work involved in the migration. 


> Move ChangelogPartitionManager and CoordinatorStream ConfigReader to 
> MetadataStore
> ----------------------------------------------------------------------------------
>
>                 Key: SAMZA-2161
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2161
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> Currently the metadata of a samza job is stored into a kafka topic named 
> coordinator stream.
> In samza-yarn ApplicationMaster, the same coordinator stream is read twice as 
> a part of the startup sequence. This duplicate read unnecessarily prolongs 
> the startup time of the application master and makes the container allocation 
> take longer than usual. This inadvertently incurs a substantial increase in 
> input stream processing delay depending upon the size of the coordinator 
> stream. 
> To mitigate this problem, the two util classes in samza viz 
> `ChangelogPartitionManager`, `Config` should be moved to use the 
> MetadataStore abstraction. This ticket tracks the work involved in the 
> migration. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to