[ 
https://issues.apache.org/jira/browse/SAMZA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Pan (Data Infrastructure) updated SAMZA-1670:
------------------------------------------------
    Fix Version/s: 0.15.0

> When fetching a newest offset for a partition, also prefetch and cache the 
> newest offsets for other partitions on the container
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-1670
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1670
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Cameron Lee
>            Priority: Major
>              Labels: metadata
>             Fix For: 0.15.0
>
>
> ExtendedSystemAdmin.getNewestOffset current just works on one 
> system-stream-partition at a time. As an optimization, when one 
> system-stream-partition needs a newest offset, a batch call can be leveraged 
> to also fetch newest offsets (and cache the data) for other partitions on the 
> same container.
> This can help to reduce the call volume to system admins to get newest offset 
> metadata. This can also help reduce contention on system admins when metadata 
> is needed by multiple threads at the same time.
> *Proposed approach:*
> Add a new getNewestOffset API to StreamMetadataCache. Have the cache keep 
> track of all system-stream-partitions that have asked for newest offsets 
> before, and when a system-stream-partition needs newest offset metadata, 
> check if there are any other stale entries and fetch those as well. This also 
> requires adding a getNewestOffsets batch call to ExtendedSystemAdmin. The 
> benefit here is that StreamMetadataCache is already reused by multiple tasks, 
> but the disadvantage is that it has to keep track of new state.
> *Alternative approach:*
> Collect all system-stream-partitions that will need newest offset metadata at 
> setup, and then make the batch call whenever any of those partitions needs 
> metadata and cache the metadata. The benefit for this approach is that no 
> state needs to be built up, as it is known at setup, but it might be unclean 
> to do the initial collection and keep track of it. For example, it might be 
> necessary to store container-granular information inside partition-granular 
> objects (e.g. TaskStorageManager).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to