[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor

ASF GitHub Bot (JIRA) Wed, 24 May 2017 10:53:18 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023328#comment-16023328
 ]


ASF GitHub Bot commented on NIFI-3644:
--------------------------------------

Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/1645
  
    Sorry for taking so long to get back to this...
    
    I tested this using PutDistributedMapCache and FetchDistributedMapCache, 
and noticed the value coming back from fetch wasn't exactly what I had stored. 
    
    In HBaseRowHandler we had:
    `lastResultBytes = resultCell.getValueArray()`
    
    And we need:
    `lastResultBytes = Arrays.copyOfRange(resultCell.getValueArray(), 
resultCell.getValueOffset(), resultCell.getValueLength() + 
resultCell.getValueOffset());
    `
    
    I made a commit here that includes the change:
    
https://github.com/bbende/nifi/commit/dc8f14d95d6cdbab2aa6e815269fe0d98faa2fe6
    
    I also moved MockHBaseClientService into it's own class so it can be used 
by both tests, so that we don't have to duplicate that code.
    
    Everything else looks good so I will go ahead and merge these changes 
together (your commit then mine). 
    
    Thanks again for contributing! and sorry for the delay.



> Add DetectDuplicateUsingHBase processor
> ---------------------------------------
>
>                 Key: NIFI-3644
>                 URL: https://issues.apache.org/jira/browse/NIFI-3644
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Bjorn Olsen
>            Priority: Minor
>
> The DetectDuplicate processor makes use of a distributed map cache for 
> maintaining a list of unique file identifiers (such as hashes).
> The distributed map cache functionality could be provided by an HBase table, 
> which then allows for reliably storing a huge volume of file identifiers and 
> auditing information. The downside of this approach is of course that HBase 
> is required.
> Storing the unique file identifiers in a reliable, query-able manner along 
> with some audit information is of benefit to several use cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor

Reply via email to