Thanks Markus, I'm not sure where the error was but I reinstalled Nutch and
it works with your setup.
Am Mo., 31. Okt. 2022 um 14:36 Uhr schrieb Markus Jelsma <
markus.jel...@openindex.io>:
> Hmmm, using a clean current Nutch i can get it to work with:
>
>
>http.agent.name
>NutchTest
Hmmm, using a clean current Nutch i can get it to work with:
http.agent.name
NutchTest
index.parse.md
h1,h2
plugin.includes
headings|protocol-http|parse-tika|index-metadata
$ bin/nutch indexchecker https://nutch.apache.org/
digest :
Hello Markus!
Thank you for taking care of my problem!
I removed the metatag.h# fron index.parse.md but ntuch indexchecker do not
show me still the fields.
Am Mo., 31. Okt. 2022 um 12:56 Uhr schrieb Markus Jelsma <
markus.jel...@openindex.io>:
> Hello Mike,
>
> Please remove the metatag.*
Hello Mike,
Please remove the metatag.* prefix in the index.parse.md config and i think
you should be fine.
Regards,
Markus
Op ma 31 okt. 2022 om 12:32 schreef Mike :
> Yes, sorry, I also forgot to post this setting:
>
>
>index.parse.md
>
>
>
>
Yes, sorry, I also forgot to post this setting:
index.parse.md
metatag.description,metatag.keywords,metatag.rating,metatag.h1,metatag.h2,metatag.h3,metatag.h4,metatag.h5,metatag.h6
Comma-separated list of keys to be taken from the parse metadata to
generate fields.
Can be used
Hello Mike,
I think it should be working just fine with it enabled in
protocol.includes. You can check Nutch' parser output by using:
$ bin/nutch parsechecker
You should see one or more h# output fields present. You can then use the
index-metadata plugin to map the parser output fields to the
6 matches
Mail list logo