[
https://issues.apache.org/jira/browse/SAMZA-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246401#comment-16246401
]
ASF GitHub Bot commented on SAMZA-1480:
---------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/samza/pull/350
> TaskStorageManager improperly initializes changelog consumer position when
> restoring a store from disk
> ------------------------------------------------------------------------------------------------------
>
> Key: SAMZA-1480
> URL: https://issues.apache.org/jira/browse/SAMZA-1480
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.10.0
> Reporter: Jake Maes
> Assignee: Jake Maes
> Priority: Trivial
> Fix For: 0.14.0
>
>
> For the Host Affinity state restore, an OFFSET file is written to disk on
> each commit. This offset file contains the most recently written changelog
> event which is also reflected in the on-disk state. When the container is
> restarted, it restores the on-disk store and then replays the changelog from
> the offset recorded in the OFFSET file in order to restore any changelog
> events that were produced when the job ran on a different host.
> http://samza.apache.org/learn/documentation/0.13/yarn/yarn-host-affinity.html
> When TaskStorageManager initializes the consumer, it uses the offset from the
> OFFSET file, which is already reflected in the state.
> Instead, it should use the SystemAdmin.getOffsetsAfter() method to get the
> next offset to consume. This will avoid the replay of 1 extra message for
> state restore.
> It should then use SystemAdmin.offsetComparator() to use the larger of the
> next offset (calculated above) and the oldest offset (according to the
> metadata). This is necessary for changelogs configured with TTL retention
> rather than infinite retention where the offset from the OFFSET file may no
> longer be valid.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)