Hi, I am still not able to crawl websites requiring authentication. The version of Nutch used is 1.10.
While crawling I am getting the following warnings and still not able to identify what is going wrong. Please find the httpclient-auth.xml file in the following link. https://gist.github.com/tizyninan/4412936795b02bbe9cee INFO conf.Configuration: found resource httpclient-auth.xml at jar:file:../target/NutchCrawler-1.0-SNAPSHOT.jar!/httpclient-auth.xml INFO fetcher.Fetcher: -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not recognized in httpclient-auth.xml - expected <authscope> WARN httpclient.Http: Bad auth conf file: Element <additionalPostHeaders> not recognized in httpclient-auth.xml - expected <authscope> WARN httpclient.Http: Bad auth conf file: Element <removedFormFields> not recognized in httpclient-auth.xml - expected <authscope> Looking forward for help. Thanks, Tizy On Thu, Mar 19, 2015 at 6:47 AM, Mohammed Omer <[email protected]> wrote: > Edit: The first link should be > https://www.mikeash.com/getting_answers.html > > Thank you, > > Mo > > On Wed, Mar 18, 2015 at 8:16 PM, Mohammed Omer <[email protected]> > wrote: > > > Tizy, in order to help debug your error, you'll need to provide > additional > > information. Check out this link for what's generally needed when trying > to > > debug over chat/email: http://www.mikeash.com/getting_answers > > > > The error seems to say that httpclient.Http doesn't like the auth conf > > file you provided. Can you post it and any other relevant changes you've > > made to a http://gist.github.com file, and post it here? > > > > Thank you, > > > > Mo > > > > On Fri, Mar 13, 2015 at 12:43 AM, Tizy Ninan <[email protected]> wrote: > > > >> Hi Lewis, > >> > >> Thank you for the reply. > >> > >> I tried by providing the parameters specified in the httpclient-auth.xml > >> template file. But while crawling I am getting the following warnings. > >> > >> WARN httpclient.Http: Bad auth conf file: root element <credentials> > found > >> in httpclient-auth.xml - must be <auth-configuration> > >> WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not > >> recognized in httpclient-auth.xml - expected <credentials> > >> WARN httpclient.Http: Bad auth conf file: Element > <additionalPostHeaders> > >> not recognized in httpclient-auth.xml - expected <credentials> > >> > >> The httpclient-auth.xml file is placed in the conf folder. The version > of > >> nutch used is nutch 1.10 (trunk). > >> > >> Could you please explain what could be wrong? > >> > >> Thanks, > >> Tizy > >> > >> > >> On Fri, Mar 13, 2015 at 1:26 AM, Lewis John Mcgibbney < > >> [email protected]> wrote: > >> > >> > Hi Tizy, > >> > > >> > On Thu, Mar 12, 2015 at 12:20 AM, <[email protected]> > >> > wrote: > >> > > >> > > > >> > > Is there any detailed step by step explanation on how to implement > >> > > HTTPPostAuthentication on Nutch 1.10.? > >> > > > >> > > > >> > > >> > > >> > https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.template#L61-L105 > >> > https://wiki.apache.org/nutch/HttpPostAuthentication > >> > HTH > >> > Lewis > >> > > >> > >> > >> > >> -- > >> Thanks and Regards, > >> Tizy > >> > > > > > -- Thanks and Regards, Tizy

