[ 
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892074#action_12892074
 ] 

Chris A. Mattmann commented on NUTCH-677:
-----------------------------------------

Odd, OK, the patch now appears to work? I wonder if I had some temp directory 
that was messing stuff up before. Looking at the patch, it doesn't matter if 
any mergeFilters haven't been defined yet (which was my original thought as to 
why it wasn't working). If there aren't any mergeFilters defined, 
SegmentMergeFilters returns true which causes SegmentMerger to _not_ break out 
of the function (which was its old behavior) anyways, so this patch works 
great. Thanks! I'll commit it to trunk, and then backport to branch-1.2 and 
nutchbase shortly...

> Segment merge filering based on segment content
> -----------------------------------------------
>
>                 Key: NUTCH-677
>                 URL: https://issues.apache.org/jira/browse/NUTCH-677
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Marcin Okraszewski
>            Assignee: Chris A. Mattmann
>         Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, 
> NUTCH-677.Mattmann.071410.patch.txt, SegmentMergeFilter.java, 
> SegmentMergeFilter.java, SegmentMergeFilters.java, SegmentMergeFilters.java
>
>
> I needed a segment filtering based on meta data detected during parse phase. 
> Unfortunately current URL based filtering does not allow for this. So I have 
> created a new SegmentMergeFilter extension which receives segment entry which 
> is being merged and decides if it should be included or not. Even though I 
> needed only ParseData for my purpose I have done it a bit more general 
> purpose, so the filter receives all merged data.
> The attached patch is for version 0.9 which I use. Unfortunately I didn't 
> have time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to