[ 
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235468#comment-13235468
 ] 

Julien Nioche commented on NUTCH-809:
-------------------------------------

Hi Lewis

bq. Can you confirm what you would like to see added to the wiki?, I will try 
my best to get this added, are you referring to the [0]? 

Nope. I meant replacing the wiki page written by Elizabeth with instructions on 
what to do to get the metatags parsed and indexed. What I committed relies on 
another plugin for indexing metadata whereas the old one had its own indexer 
etc...

bq. Also I thought the best thing to do regarding porting to Nutchgora is just 
to add it to the ever growing NUTCH-1104 list, so I have done so. If and when 
this is required over there someone can duly oblige

good thinking

bq. Regarding adding fields to Solr I assume you mean schema and 
solr-mapping.xml?

yes, this will be needed if we want this to be on by default which I think is a 
good idea

bq. Finally can you expand on 'activate by default', what exactly is it that 
not activated by default? I read your README.txt but I can see any mention of 
it in there.

Plugins have to be listed in plugin.includes in order to be used. Thinking 
about it it would be good to declare a dependency to index-metatags so that the 
later is activated automatically (assuming plugin.auto-activation = true) 

Thanks

Julien

                
> Parse-metatags plugin
> ---------------------
>
>                 Key: NUTCH-809
>                 URL: https://issues.apache.org/jira/browse/NUTCH-809
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.4, nutchgora
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.5
>
>         Attachments: NUTCH-809-trunk.patch, NUTCH-809.patch, 
> NUTCH-809_metatags_1.3.patch, metatags-plugin+tutorial.zip
>
>
> h2. Parse-metatags plugin
> The parse-metatags plugin consists of a HTMLParserFilter which takes as 
> parameter a list of metatag names with '*' as default value. The values are 
> separated by ';'.
> In order to extract the values of the metatags description and keywords, you 
> must specify in nutch-site.xml
> {code:xml}
> <property>
>   <name>metatags.names</name>
>   <value>description;keywords</value>
> </property>
> {code}
> The MetatagIndexer uses the output of the parsing above to create two fields 
> 'keywords' and 'description'. Note that keywords is multivalued.
> The query-basic plugin is used to include these fields in the search e.g. in 
> nutch-site.xml
> {code:xml}
> <property>
>   <name>query.basic.description.boost</name>
>   <value>2.0</value>
> </property>
> <property>
>   <name>query.basic.keywords.boost</name>
>   <value>2.0</value>
> </property>
> {code}
> This code has been developed by DigitalPebble Ltd and offered to the 
> community by ANT.com

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to