Hello Raj,

I see. Unfortunately turning on Javascript supporting protocol plugins such
as Htmlunit or Selenium does not always solve the problem

Maybe you can ask at the Selenium project about this. They are the experts
on that particular problem.

Regards,
Markus

Op di 1 aug 2023 om 19:38 schreef Raj Chidara <raj.chid...@ddismart.com>:

> Hello Markus
>   Now, I have removed all other protocol-* and given only
> protocol-selenium.  Now it crawled few pages.  However, there is no content
> read from pages.  All pages are shown as only with text *Home*
>
> Thanks and Regards
> Raj Chidara
>
>
>
> ---- On Mon, 30 Jan 2023 18:35:06 +0530 *Markus Jelsma
> <markus.jel...@openindex.io <markus.jel...@openindex.io>>* wrote ---
>
> Yes, remove the other protocol-* plugins from the configuration. With all
> three active it is not always determined which one is going to do the
> work.
>
> Op ma 30 jan. 2023 om 12:50 schreef Raj Chidara <raj.chid...@ddismart.com>:
>
>
> >
> > Hello Markus
> > Sorry for duplicate question. I added selenium plugin in
> > conf/nutch-default.xml and included following
> >
> > <name>plugin.includes</name>
> >
> >
> <value>protocol-http|protocol-httpclient|protocol-selenium|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>
> >
> > Still the site is not crawling. Are there any additional steps to be
> > followed for installation of selenium. Please suggest
> >
> >
> > Thanks and Regards
> >
> > Raj Chidara
> >
> > ----- Original Message -----
> > From: Markus Jelsma (markus.jel...@openindex.io)
> > Date: 30-01-2023 16:26
> > To: user@nutch.apache.org
> > Subject: Re: Siet is not crawling
> >
> > Hello Raj,
> >
> > I think the same question about the same site was asked here some time
> ago.
> > Anyway, this site loads its content via Javascript. You will need a
> > protocol plugin that supports it, either protocol-htmlunit, or
> > protocol-selenium, instead of protocol-http or any other.
> >
> > Change the configuration for plugin.includes, and it should work.
> >
> > Markus
> >
> > Op ma 30 jan. 2023 om 10:39 schreef Raj Chidara <
> raj.chid...@ddismart.com
> > >:
> >
> > >
> > > Hello,
> > >
> > > Nutch is not able crawl this site. Are there any nutch configuration
> > > changes required for this site?
> > >
> > > https://www.ich.org/
> > >
> > >
> > > Thanks and Regards
> > >
> > > Raj Chidara
> > >
> > >
> > >
> >
> >
>
>
>
>

Reply via email to