[
https://issues.apache.org/jira/browse/SOLR-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185385#comment-15185385
]
Joel Bernstein edited comment on SOLR-8709 at 3/8/16 6:06 PM:
--------------------------------------------------------------
A little more detail on the design. A new "retentionWindow" parameter will be
added to the TopicStream to define the window of time for holding processed
version numbers. This retentionWindow should be larger then the soft commit
window. A TreeMap will be used to hold all the version numbers that have been
processed. This TreeMap will be trimmed to hold only version numbers within the
retention window. A PostFilter will be added that checks to see if a document
is within the retentionWindow but is not in the TreeMap. This should catch any
out of order version numbers. The contents of the TreeMap will be persisted as
part of the checkpoint.
was (Author: joel.bernstein):
A little more detail on the design. A new "retentionWindow" parameter will be
added to the TopicStream to define the window of time for holding processed
version numbers. This retentionWindow should be larger then the soft commit
window. A TreeMap will be used to hold all the version numbers that have been
processed. This TreeMap will be trimmed to hold only version numbers with the
retention window. A PostFilter will be added that checks to see if a document
is within the retentionWindow but is not in the TreeMap. This should catch any
out of order version numbers. The contents of the TreeMap will be persisted as
part of the checkpoint.
> Account for out-of-order version numbers in the TopicStream
> -----------------------------------------------------------
>
> Key: SOLR-8709
> URL: https://issues.apache.org/jira/browse/SOLR-8709
> Project: Solr
> Issue Type: Bug
> Reporter: Joel Bernstein
>
> Currently the TopicStream can miss documents if version numbers are received
> out-of-order. The TopicStream sorts on version number so it will only miss
> out-of-order versions that span commit boundaries.
> In order to resolve this issue we can adopt an approach that keeps a set of
> the last N version numbers sent for each Topic. As the documents are scanned
> we can check for documents within this time window that do not appear in the
> sent set. These documents can then be sent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]