Hi Lewis, According to the documentation (in the file httpclient-auth.xml.template):
loginFormId - the <form id="$formId" attribute value(or the 'name' attribute if no form is referenced by 'id' attribute) So I'm pretty sure I got it right as the page html source contains: <form accept-charset="UTF-8" action="/login" *id="login**"* method="post"> Thus, I'm now getting this after following your suggestion: $ bin/nutch parsechecker https://urs.earthdata.nasa.gov fetching: https://urs.earthdata.nasa.gov http.proxy.host = null http.proxy.port = 8080 http.timeout = 12000 http.content.limit = -1 http.agent = AlmohsinNutch.... http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 *No form element found with 'id' = username, trying 'name'.* *No form element found with 'name' = username* Failed to get protocol output java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: username at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:470) at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:171) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:206) at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:136) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:244) Caused by: java.lang.IllegalArgumentException: No form exists: username at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:183) at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95) at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:468) ... 5 more Fetch failed with protocol status: exception(16), lastModified=0: java.lang.RuntimeException: *java.lang.IllegalArgumentException: No form exists: username* Best regards, Mohammad Al-Mohsin On Wed, Feb 18, 2015 at 12:57 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi > > On Tue, Feb 17, 2015 at 12:25 PM, <[email protected]> > wrote: > >> >> No form element found with 'id' = login, trying 'name'. >> No form element found with 'name' = login >> > > The form element for id is not 'login', it is 'username'. > The form element for the password is 'password' > > >> I was wondering also if *loginUrl* should be set to the url of the page >> containing the auth form (*https://urs.earthdata.nasa.gov >> <https://urs.earthdata.nasa.gov>*) or to the form action url where data >> are actually posted (*https://urs.earthdata.nasa.gov/login >> <https://urs.earthdata.nasa.gov/login>). *The documentation says (loginUrl >> - the URL containing the actual <form>) but is it really the case? >> > > Yes it is > > >> I am using latest Nutch 1.10 trunk version that includes NUTCH-827v3 >> patch <https://issues.apache.org/jira/browse/NUTCH-827> on latest OS X >> Yosemite (10.10.2). >> > > Great, Please try my above suggestion and it will work. > >> Please let me know if I'm missing something! >> >> >> >> Thanks > Lewis >

