Re: How should the headings plugin be configured?

2022-10-31 Thread Mike
Thanks Markus, I'm not sure where the error was but I reinstalled Nutch and it works with your setup. Am Mo., 31. Okt. 2022 um 14:36 Uhr schrieb Markus Jelsma < markus.jel...@openindex.io>: > Hmmm, using a clean current Nutch i can get it to work with: > > >http.agent.name >NutchTest

Re: How should the headings plugin be configured?

2022-10-31 Thread Markus Jelsma
Hmmm, using a clean current Nutch i can get it to work with: http.agent.name NutchTest index.parse.md h1,h2 plugin.includes headings|protocol-http|parse-tika|index-metadata $ bin/nutch indexchecker https://nutch.apache.org/ digest :

Re: How should the headings plugin be configured?

2022-10-31 Thread Mike
Hello Markus! Thank you for taking care of my problem! I removed the metatag.h# fron index.parse.md but ntuch indexchecker do not show me still the fields. Am Mo., 31. Okt. 2022 um 12:56 Uhr schrieb Markus Jelsma < markus.jel...@openindex.io>: > Hello Mike, > > Please remove the metatag.*

Re: How should the headings plugin be configured?

2022-10-31 Thread Markus Jelsma
Hello Mike, Please remove the metatag.* prefix in the index.parse.md config and i think you should be fine. Regards, Markus Op ma 31 okt. 2022 om 12:32 schreef Mike : > Yes, sorry, I also forgot to post this setting: > > >index.parse.md > > > >

Re: How should the headings plugin be configured?

2022-10-31 Thread Mike
Yes, sorry, I also forgot to post this setting: index.parse.md metatag.description,metatag.keywords,metatag.rating,metatag.h1,metatag.h2,metatag.h3,metatag.h4,metatag.h5,metatag.h6 Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used

Re: How should the headings plugin be configured?

2022-10-31 Thread Markus Jelsma
Hello Mike, I think it should be working just fine with it enabled in protocol.includes. You can check Nutch' parser output by using: $ bin/nutch parsechecker You should see one or more h# output fields present. You can then use the index-metadata plugin to map the parser output fields to the