[jira] Commented: (CHUKWA-338) duplicate suppression in archiver

Jerome Boulon (JIRA) Mon, 13 Jul 2009 13:59:41 -0700

    [ 
https://issues.apache.org/jira/browse/CHUKWA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730510#action_12730510
 ]


Jerome Boulon commented on CHUKWA-338:
--------------------------------------

Ari,
Yes, a secondary sort (grouping comparator) will solve the issue but I'm not 
sure if all current adaptors are in line with the concept of virtual offset so 
that would be the first think to validate.
Also, if you have more than one value for the same key, you may want to double 
check that they actually have the same size/content to make sure it's a real 
duplicate and not an issue with the virtual offset, especially after rotation.

Since in my mind, the archiver is a background process, it should not be too 
bad to allways check for real duplicates vs false duplicates (same SequenceId 
but not same content).





> duplicate suppression in archiver
> ---------------------------------
>
>                 Key: CHUKWA-338
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-338
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: archiveDupSuppress.patch
>
>
> Right now, Archiver uses an identity reducer.
> It should be straightforward to write a custom reducer that does duplicate 
> detection and suppression if we get multiple chunks with the same key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-338) duplicate suppression in archiver

Reply via email to