Siet is not crawling

2023-01-30 Thread Raj Chidara
Hello,   Nutch is not able crawl this site.  Are there any nutch configuration changes required for this site? https://www.ich.org/ Thanks and Regards Raj Chidara

Re[2]: Siet is not crawling

2023-01-30 Thread Raj Chidara
Hello Markus   Sorry for duplicate question.  I added selenium plugin in conf/nutch-default.xml and included following plugin.includes  

Re: Re[2]: Siet is not crawling

2023-01-30 Thread Markus Jelsma
Yes, remove the other protocol-* plugins from the configuration. With all three active it is not always determined which one is going to do the work. Op ma 30 jan. 2023 om 12:50 schreef Raj Chidara : > > Hello Markus > Sorry for duplicate question. I added selenium plugin in >

Re: Siet is not crawling

2023-01-30 Thread Markus Jelsma
Hello Raj, I think the same question about the same site was asked here some time ago. Anyway, this site loads its content via Javascript. You will need a protocol plugin that supports it, either protocol-htmlunit, or protocol-selenium, instead of protocol-http or any other. Change the

Re: Re[2]: Siet is not crawling

2023-01-30 Thread Steven Zhu
Already unsubscribed. Why do I still get this email? Thanks Steven On Mon, Jan 30, 2023 at 7:06 AM Markus Jelsma wrote: > Yes, remove the other protocol-* plugins from the configuration. With all > three active it is not always determined which one is going to do the work. > > Op ma 30 jan.