Thanks Tizy - adding Tyler to this in case he didn’t see it. Tyler is this what you were running into? Thoughts?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Tizy Ninan <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, April 7, 2015 at 5:11 AM To: "[email protected]" <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Re: HTTP Post Authentication >Hi, > > >I am still not able to crawl websites requiring authentication. >The version of Nutch used is 1.10. > > >While crawling I am getting the following warnings and still not able to >identify what is going wrong. >Please find the httpclient-auth.xml file in the following link. > https://gist.github.com/tizyninan/4412936795b02bbe9cee > > > >INFO conf.Configuration: found resource httpclient-auth.xml at >jar:file:../target/NutchCrawler-1.0-SNAPSHOT.jar!/httpclient-auth.xml >INFO fetcher.Fetcher: -activeThreads=1, spinWaiting=0, >fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 >WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not >recognized in httpclient-auth.xml - expected <authscope> >WARN httpclient.Http: Bad auth conf file: Element <additionalPostHeaders> >not recognized in httpclient-auth.xml - expected <authscope> >WARN httpclient.Http: Bad auth conf file: Element <removedFormFields> not >recognized in httpclient-auth.xml - expected <authscope> > > > >Looking forward for help. > > >Thanks, >Tizy > > > > >On Thu, Mar 19, 2015 at 6:47 AM, Mohammed Omer ><[email protected]> wrote: > >Edit: The first link should be >https://www.mikeash.com/getting_answers.html ><https://www.mikeash.com/getting_answers.html> > >Thank you, > >Mo > >On Wed, Mar 18, 2015 at 8:16 PM, Mohammed Omer <[email protected]> >wrote: > >> Tizy, in order to help debug your error, you'll need to provide >>additional >> information. Check out this link for what's generally needed when >>trying to >> debug over chat/email: >http://www.mikeash.com/getting_answers ><http://www.mikeash.com/getting_answers> >> >> The error seems to say that httpclient.Http doesn't like the auth conf >> file you provided. Can you post it and any other relevant changes you've >> made to a http://gist.github.com file, and post it here? >> >> Thank you, >> >> Mo >> >> On Fri, Mar 13, 2015 at 12:43 AM, Tizy Ninan <[email protected]> wrote: >> >>> Hi Lewis, >>> >>> Thank you for the reply. >>> >>> I tried by providing the parameters specified in the >>>httpclient-auth.xml >>> template file. But while crawling I am getting the following warnings. >>> >>> WARN httpclient.Http: Bad auth conf file: root element <credentials> >>>found >>> in httpclient-auth.xml - must be <auth-configuration> >>> WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not >>> recognized in httpclient-auth.xml - expected <credentials> >>> WARN httpclient.Http: Bad auth conf file: Element >>><additionalPostHeaders> >>> not recognized in httpclient-auth.xml - expected <credentials> >>> >>> The httpclient-auth.xml file is placed in the conf folder. The version >>>of >>> nutch used is nutch 1.10 (trunk). >>> >>> Could you please explain what could be wrong? >>> >>> Thanks, >>> Tizy >>> >>> >>> On Fri, Mar 13, 2015 at 1:26 AM, Lewis John Mcgibbney < >>> [email protected]> wrote: >>> >>> > Hi Tizy, >>> > >>> > On Thu, Mar 12, 2015 at 12:20 AM, <[email protected]> >>> > wrote: >>> > >>> > > >>> > > Is there any detailed step by step explanation on how to implement >>> > > HTTPPostAuthentication on Nutch 1.10.? >>> > > >>> > > >>> > >>> > >>> >https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.templa >te#L61-L105 ><https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.templ >ate#L61-L105> >>> > >https://wiki.apache.org/nutch/HttpPostAuthentication ><https://wiki.apache.org/nutch/HttpPostAuthentication> >>> > HTH >>> > Lewis >>> > >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Tizy >>> >> >> > > > > > > > > > >-- >Thanks and Regards, >Tizy > > > > > > >

