[
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607771#comment-15607771
]
ASF GitHub Bot commented on APEXMALHAR-2309:
--------------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/apex-malhar/pull/464
> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -----------------------------------------------------------------------------
>
> Key: APEXMALHAR-2309
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
> Project: Apache Apex Malhar
> Issue Type: Bug
> Affects Versions: 3.5.0
> Reporter: Francis Fernandes
> Assignee: Francis Fernandes
> Fix For: 3.6.0
>
>
> The deduper marks valid tuples outside the expiry window as duplicates.
> Consider the following configuration (number of buckets = 1 )
> {code}
> <property>
>
> <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
> <value>10</value>
> </property>
> <property>
> <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
> <value>10</value>
> </property>
> {code}
> The data piped in is :
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is
> marked as duplicate because although the first tuple although expired is
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing
> duration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)