[ https://issues.apache.org/jira/browse/SAMZA-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yi Pan (Data Infrastructure) updated SAMZA-1670: ------------------------------------------------ Fix Version/s: 0.15.0 > When fetching a newest offset for a partition, also prefetch and cache the > newest offsets for other partitions on the container > ------------------------------------------------------------------------------------------------------------------------------- > > Key: SAMZA-1670 > URL: https://issues.apache.org/jira/browse/SAMZA-1670 > Project: Samza > Issue Type: Improvement > Reporter: Cameron Lee > Priority: Major > Labels: metadata > Fix For: 0.15.0 > > > ExtendedSystemAdmin.getNewestOffset current just works on one > system-stream-partition at a time. As an optimization, when one > system-stream-partition needs a newest offset, a batch call can be leveraged > to also fetch newest offsets (and cache the data) for other partitions on the > same container. > This can help to reduce the call volume to system admins to get newest offset > metadata. This can also help reduce contention on system admins when metadata > is needed by multiple threads at the same time. > *Proposed approach:* > Add a new getNewestOffset API to StreamMetadataCache. Have the cache keep > track of all system-stream-partitions that have asked for newest offsets > before, and when a system-stream-partition needs newest offset metadata, > check if there are any other stale entries and fetch those as well. This also > requires adding a getNewestOffsets batch call to ExtendedSystemAdmin. The > benefit here is that StreamMetadataCache is already reused by multiple tasks, > but the disadvantage is that it has to keep track of new state. > *Alternative approach:* > Collect all system-stream-partitions that will need newest offset metadata at > setup, and then make the batch call whenever any of those partitions needs > metadata and cache the metadata. The benefit for this approach is that no > state needs to be built up, as it is known at setup, but it might be unclean > to do the initial collection and keep track of it. For example, it might be > necessary to store container-granular information inside partition-granular > objects (e.g. TaskStorageManager). -- This message was sent by Atlassian JIRA (v7.6.3#76005)