Hmmm, using a clean current Nutch i can get it to work with: <configuration> <property> <name>http.agent.name</name> <value>NutchTest</value> </property> <property> <name>index.parse.md</name> <value>h1,h2</value> </property> <property> <name>plugin.includes</name> <value>headings|protocol-http|parse-tika|index-metadata</value> </property> </configuration>
$ bin/nutch indexchecker https://nutch.apache.org/ digest : 13584e71e6e09a71071936feb97892b8 h1 : Apache Nutchâ„¢ id : https://nutch.apache.org/ Can you check you configuration? Is a plugin name mispelled? Is the headings plugin active during fetch/parse? Is the index-metadata plugin active? Regards, Markus Op ma 31 okt. 2022 om 14:14 schreef Mike <mz579...@gmail.com>: > Hello Markus! > > Thank you for taking care of my problem! > > I removed the metatag.h# fron index.parse.md but ntuch indexchecker do not > show me still the fields. > > Am Mo., 31. Okt. 2022 um 12:56 Uhr schrieb Markus Jelsma < > markus.jel...@openindex.io>: > > > Hello Mike, > > > > Please remove the metatag.* prefix in the index.parse.md config and i > > think > > you should be fine. > > > > Regards, > > Markus > > > > Op ma 31 okt. 2022 om 12:32 schreef Mike <mz579...@gmail.com>: > > > > > Yes, sorry, I also forgot to post this setting: > > > > > > <property> > > > <name>index.parse.md</name> > > > > > > > > > > > > <value>metatag.description,metatag.keywords,metatag.rating,metatag.h1,metatag.h2,metatag.h3,metatag.h4,metatag.h5,metatag.h6</value> > > > <description> > > > Comma-separated list of keys to be taken from the parse metadata to > > > generate fields. > > > Can be used e.g. for 'description' or 'keywords' provided that these > > > values are generated > > > by a parser (see parse-metatags plugin) > > > </description> > > > </property> > > > > > > The Nutch parsechecker shows me the fields but the indexchecker > doesn't. > > > > > > Am Mo., 31. Okt. 2022 um 04:51 Uhr schrieb Mike <mz579...@gmail.com>: > > > > > > > Hello! > > > > > > > > I've tried everything and set everything up and get the nutch > headings > > > > plugin working: > > > > > > > > nutch-site.xml > > > > > > > > <property>protocol-okhttp > > > > <name> > > > > > > > > > > > > > > <value>protocol-okhttp|...|parse-(html|tika|text|metatags)|index-(basic|anchor|more|metadata)|...|headings|nutch-extensionpoints</value> > > > > </property> > > > > > > > > schema.xml > > > > > > > > > > > > <!-- fields for the headings plugin --> > > > > <field name="h1" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > <field name="h2" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > <field name="h3" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > <field name="h4" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > <field name="h5" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > <field name="h6" type="text_general" stored="true" indexed="true" > > > > multiValued="true"/> > > > > > > > > index-writers.xml > > > > <mapping> > > > > <rename> > > > > <field source="metatag.h1" dest="h1"/> > > > > <field source="metatag.h2" dest="h2"/> > > > > <field source="metatag.h3" dest="h3"/> > > > > <field source="metatag.h4" dest="h4"/> > > > > <field source="metatag.h5" dest="h5"/> > > > > <field source="metatag.h6" dest="h6"/> > > > > </rename> > > > > ... > > > > > > > > After indexing to solr there are no HTML headings tags in my solr > > index, > > > > what's missing? > > > > > > > > thanks! > > > > > > > > > >