[
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892074#action_12892074
]
Chris A. Mattmann commented on NUTCH-677:
-----------------------------------------
Odd, OK, the patch now appears to work? I wonder if I had some temp directory
that was messing stuff up before. Looking at the patch, it doesn't matter if
any mergeFilters haven't been defined yet (which was my original thought as to
why it wasn't working). If there aren't any mergeFilters defined,
SegmentMergeFilters returns true which causes SegmentMerger to _not_ break out
of the function (which was its old behavior) anyways, so this patch works
great. Thanks! I'll commit it to trunk, and then backport to branch-1.2 and
nutchbase shortly...
> Segment merge filering based on segment content
> -----------------------------------------------
>
> Key: NUTCH-677
> URL: https://issues.apache.org/jira/browse/NUTCH-677
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 0.9.0
> Reporter: Marcin Okraszewski
> Assignee: Chris A. Mattmann
> Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch,
> NUTCH-677.Mattmann.071410.patch.txt, SegmentMergeFilter.java,
> SegmentMergeFilter.java, SegmentMergeFilters.java, SegmentMergeFilters.java
>
>
> I needed a segment filtering based on meta data detected during parse phase.
> Unfortunately current URL based filtering does not allow for this. So I have
> created a new SegmentMergeFilter extension which receives segment entry which
> is being merged and decides if it should be included or not. Even though I
> needed only ParseData for my purpose I have done it a bit more general
> purpose, so the filter receives all merged data.
> The attached patch is for version 0.9 which I use. Unfortunately I didn't
> have time to check how it fits to trunk version. Sorry :(
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.