Elisabeth,

Great. Could you attach your patch to the original issue in JIRA instead and
check the box : Grant license to ASF for inclusion in ASF works?

Julien

On 21 September 2011 16:47, Elisabeth Adler <[email protected]>wrote:

> Hi,
>
> Based on the suggestions/code from https://issues.apache.org/**
> jira/browse/NUTCH-585 <https://issues.apache.org/jira/browse/NUTCH-585>, I
> have created a plugin toblacklist or whitelist html elements. This was based
> on the need for not indexing header/footer/navigation, so the user gets
> really only relevant results, e.g. even if the term shows up in the
> navigation.
>
> The elements to be parsed (or not) can be defined by using CSS-like
> selectors. A new field called "strippedContent" is available in the index
> which can be used for searching. Links are still crawled and parsed from the
> "content" field, allowing all pages to be parsed. The full documentation is
> in the README.txt in the patch.
>
> The patch can be found on: http://www.scintillation.at/**
> files/nutwe03mnyzwb/blacklist_**whitelist_plugin.patch<http://www.scintillation.at/files/nutwe03mnyzwb/blacklist_whitelist_plugin.patch>
>
> Maybe it is of help to someone:)
> Best,
> Elisabeth
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to