[ http://issues.apache.org/jira/browse/NUTCH-421?page=all ]

Alan Tanaman updated NUTCH-421:
-------------------------------

    Description: 
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the 
user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on 
previously generated document fields as a source of input to generate further 
fields.

As suggested elsewhere, I based this on the urlfilter.order functionality:

<property>
  <name>indexingfilter.order</name>
  <value>org.apache.nutch.indexer.basic.BasicIndexingFilter 
org.apache.nutch.indexer.more.MoreIndexingFilter</value>
  <description>The order by which index filters are applied.
  If empty, all available index filters (as dictated by properties
  plugin-includes and plugin-excludes above) are loaded and applied in system
  defined order. If not empty, only named filters are loaded and applied
  in given order. For example, if this property has value:
  org.apache.nutch.indexer.basic.BasicIndexingFilter 
org.apache.nutch.indexer.more.MoreIndexingFilter
  then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
  Since all filters are AND'ed, filter ordering does not have impact
  on end result, but it may have performance implication, depending
  on relative expensiveness of filters.
  </description>
</property>

Patch will be attached to this issue by 29/12/06

  was:
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the 
user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on 
previously generated document fields as a source of input to generate further 
fields.

As suggested elsewhere, I based this on the urlfilter.order functionality:

<property>
  <name>indexingfilter.order</name>
  <value>org.apache.nutch.indexer.basic.BasicIndexingFilter 
org.apache.nutch.indexer.more.MoreIndexingFilter</value>
  <description>The order by which index filters are applied.
  If empty, all available index filters (as dictated by properties
  plugin-includes and plugin-excludes above) are loaded and applied in system
  defined order. If not empty, only named filters are loaded and applied
  in given order. For example, if this property has value:
  org.apache.nutch.indexer.basic.BasicIndexingFilter 
org.apache.nutch.indexer.more.MoreIndexingFilter
  then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
  Since all filters are AND'ed, filter ordering does not have impact
  on end result, but it may have performance implication, depending
  on relative expensiveness of filters.
  </description>
</property>




> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
>                 Key: NUTCH-421
>                 URL: http://issues.apache.org/jira/browse/NUTCH-421
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All
>            Reporter: Alan Tanaman
>            Priority: Minor
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing 
> the user to state in which order the indexing filters are to be run based on 
> a new
> indexingfilter.order property. This is needed when a filter needs to rely on 
> previously generated document fields as a source of input to generate further 
> fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
>   <name>indexingfilter.order</name>
>   <value>org.apache.nutch.indexer.basic.BasicIndexingFilter 
> org.apache.nutch.indexer.more.MoreIndexingFilter</value>
>   <description>The order by which index filters are applied.
>   If empty, all available index filters (as dictated by properties
>   plugin-includes and plugin-excludes above) are loaded and applied in system
>   defined order. If not empty, only named filters are loaded and applied
>   in given order. For example, if this property has value:
>   org.apache.nutch.indexer.basic.BasicIndexingFilter 
> org.apache.nutch.indexer.more.MoreIndexingFilter
>   then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
>   Since all filters are AND'ed, filter ordering does not have impact
>   on end result, but it may have performance implication, depending
>   on relative expensiveness of filters.
>   </description>
> </property>
> Patch will be attached to this issue by 29/12/06

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to