[
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved NUTCH-677.
-------------------------------------
Fix Version/s: 1.2
Resolution: Fixed
- Okey dokey. I applied this patch to the current trunk (r978988) and
backported it to the 1.2 branch (r978989). I'm hesitant to mark it for 2.0
though since when I tried to apply it to the Nutchbase branch, I noticed
SegmentMerger.java is gone, so not sure this patch is applicable there, or if
it is, then I don't know the Nutch 2.0 equivalent of SegmentMerger.
Anyhoo, thanks very much for the patch, Marcin! The functionality will be there
in the 1.2 release for sure. If one of the Nutchbasers wants to port to 2.0, by
all means (but please don't reopen the issue, file a new one). Thanks!
> Segment merge filering based on segment content
> -----------------------------------------------
>
> Key: NUTCH-677
> URL: https://issues.apache.org/jira/browse/NUTCH-677
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 0.9.0
> Reporter: Marcin Okraszewski
> Assignee: Chris A. Mattmann
> Fix For: 1.2
>
> Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch,
> NUTCH-677.Mattmann.071410.patch.txt, SegmentMergeFilter.java,
> SegmentMergeFilter.java, SegmentMergeFilters.java, SegmentMergeFilters.java
>
>
> I needed a segment filtering based on meta data detected during parse phase.
> Unfortunately current URL based filtering does not allow for this. So I have
> created a new SegmentMergeFilter extension which receives segment entry which
> is being merged and decides if it should be included or not. Even though I
> needed only ParseData for my purpose I have done it a bit more general
> purpose, so the filter receives all merged data.
> The attached patch is for version 0.9 which I use. Unfortunately I didn't
> have time to check how it fits to trunk version. Sorry :(
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.