[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601113#comment-15601113
 ] 

ASF GitHub Bot commented on APEXMALHAR-2309:
--------------------------------------------

GitHub user francisf reopened a pull request:

    https://github.com/apache/apex-malhar/pull/464

    APEXMALHAR-2309 Comparing times for newer tuples with existing key

    @bhupeshchawda please review.
    Marking a tuple as unique if the time found for the key in asyncEvents is < 
current tuple's time

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/francisf/apex-malhar 
APEXMALHAR-2309_Deduper_valid_as_duplicates

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #464
    
----
commit c56e5c36c46f90fb0fee7cb6558bf860dbf6e181
Author: francisf <[email protected]>
Date:   2016-10-21T13:08:39Z

    APEXMALHAR-2309 Comparing times for newer tuples with existing key

----


> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -----------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2309
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Francis Fernandes
>            Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates. 
> Consider the following configuration (number of buckets = 1 )
> {code}
>   <property>
>     
> <name>dt.application.DedupTestApp.operator.Deduper.prop.expireBefore</name>
>     <value>10</value>
>   </property>
>   <property>
>     <name>dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan</name>
>     <value>10</value>
>   </property>
> {code}
> The data piped in is : 
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is 
> marked as duplicate because although the first tuple although expired is 
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing 
> duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to