[
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601113#comment-15601113
]
ASF GitHub Bot commented on APEXMALHAR-2309:
--------------------------------------------
GitHub user francisf reopened a pull request:
https://github.com/apache/apex-malhar/pull/464
APEXMALHAR-2309 Comparing times for newer tuples with existing key
@bhupeshchawda please review.
Marking a tuple as unique if the time found for the key in asyncEvents is <
current tuple's time
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/francisf/apex-malhar
APEXMALHAR-2309_Deduper_valid_as_duplicates
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/apex-malhar/pull/464.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #464
----
commit c56e5c36c46f90fb0fee7cb6558bf860dbf6e181
Author: francisf <[email protected]>
Date: 2016-10-21T13:08:39Z
APEXMALHAR-2309 Comparing times for newer tuples with existing key
----
> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -----------------------------------------------------------------------------
>
> Key: APEXMALHAR-2309
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
> Project: Apache Apex Malhar
> Issue Type: Bug
> Affects Versions: 3.5.0
> Reporter: Francis Fernandes
> Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates.
> Consider the following configuration (number of buckets = 1 )
> {code}
> <property>
>
> <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
> <value>10</value>
> </property>
> <property>
> <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
> <value>10</value>
> </property>
> {code}
> The data piped in is :
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is
> marked as duplicate because although the first tuple although expired is
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing
> duration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)