Hmmm, using a clean current Nutch i can get it to work with:
<configuration>
 <property>
   <name>http.agent.name</name>
   <value>NutchTest</value>
 </property>
 <property>
   <name>index.parse.md</name>
   <value>h1,h2</value>
 </property>
 <property>
   <name>plugin.includes</name>
   <value>headings|protocol-http|parse-tika|index-metadata</value>
 </property>
</configuration>

$ bin/nutch indexchecker https://nutch.apache.org/
digest :        13584e71e6e09a71071936feb97892b8
h1 :    Apache Nutchâ„¢
id :    https://nutch.apache.org/

Can you check you configuration? Is a plugin name mispelled? Is the
headings plugin active during fetch/parse? Is the index-metadata plugin
active?

Regards,
Markus


Op ma 31 okt. 2022 om 14:14 schreef Mike <mz579...@gmail.com>:

> Hello Markus!
>
> Thank you for taking care of my problem!
>
> I removed the metatag.h# fron index.parse.md but ntuch indexchecker do not
> show me still the fields.
>
> Am Mo., 31. Okt. 2022 um 12:56 Uhr schrieb Markus Jelsma <
> markus.jel...@openindex.io>:
>
> > Hello Mike,
> >
> > Please remove the metatag.* prefix in the index.parse.md config and i
> > think
> > you should be fine.
> >
> > Regards,
> > Markus
> >
> > Op ma 31 okt. 2022 om 12:32 schreef Mike <mz579...@gmail.com>:
> >
> > > Yes, sorry, I also forgot to post this setting:
> > >
> > > <property>
> > >    <name>index.parse.md</name>
> > >
> > >
> > >
> >
> <value>metatag.description,metatag.keywords,metatag.rating,metatag.h1,metatag.h2,metatag.h3,metatag.h4,metatag.h5,metatag.h6</value>
> > >    <description>
> > >    Comma-separated list of keys to be taken from the parse metadata to
> > > generate fields.
> > >    Can be used e.g. for 'description' or 'keywords' provided that these
> > > values are generated
> > >    by a parser (see parse-metatags plugin)
> > >    </description>
> > > </property>
> > >
> > > The Nutch parsechecker shows me the fields but the indexchecker
> doesn't.
> > >
> > > Am Mo., 31. Okt. 2022 um 04:51 Uhr schrieb Mike <mz579...@gmail.com>:
> > >
> > > > Hello!
> > > >
> > > > I've tried everything and set everything up and get the nutch
> headings
> > > > plugin working:
> > > >
> > > > nutch-site.xml
> > > >
> > > > <property>protocol-okhttp
> > > >   <name>
> > > >
> > > >
> > >
> >
> <value>protocol-okhttp|...|parse-(html|tika|text|metatags)|index-(basic|anchor|more|metadata)|...|headings|nutch-extensionpoints</value>
> > > > </property>
> > > >
> > > > schema.xml
> > > >
> > > >
> > > > <!-- fields for the headings plugin -->
> > > > <field name="h1" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > > <field name="h2" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > > <field name="h3" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > > <field name="h4" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > > <field name="h5" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > > <field name="h6" type="text_general" stored="true" indexed="true"
> > > > multiValued="true"/>
> > > >
> > > > index-writers.xml
> > > >   <mapping>
> > > >       <rename>
> > > >         <field source="metatag.h1" dest="h1"/>
> > > >         <field source="metatag.h2" dest="h2"/>
> > > >         <field source="metatag.h3" dest="h3"/>
> > > >         <field source="metatag.h4" dest="h4"/>
> > > >         <field source="metatag.h5" dest="h5"/>
> > > >         <field source="metatag.h6" dest="h6"/>
> > > >       </rename>
> > > > ...
> > > >
> > > > After indexing to solr there are no HTML headings tags in my solr
> > index,
> > > > what's missing?
> > > >
> > > > thanks!
> > > >
> > >
> >
>

Reply via email to