Hi Lewis,

According to the documentation (in the file httpclient-auth.xml.template):

loginFormId - the <form id="$formId" attribute value(or the 'name'
attribute if no form is referenced by 'id' attribute)
So I'm pretty sure I got it right as the page html source contains:
<form accept-charset="UTF-8" action="/login" *id="login**"* method="post">


Thus, I'm now getting this after following your suggestion:

$ bin/nutch parsechecker https://urs.earthdata.nasa.gov
fetching: https://urs.earthdata.nasa.gov
http.proxy.host = null
http.proxy.port = 8080
http.timeout = 12000
http.content.limit = -1
http.agent = AlmohsinNutch....
http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
*No form element found with 'id' = username, trying 'name'.*
*No form element found with 'name' = username*
Failed to get protocol output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form
exists: username
at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:470)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:171)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:206)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:136)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:244)
Caused by: java.lang.IllegalArgumentException: No form exists: username
at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:183)
at
org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at
org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:468)
... 5 more
Fetch failed with protocol status: exception(16), lastModified=0:
java.lang.RuntimeException: *java.lang.IllegalArgumentException: No form
exists: username*

Best regards,
Mohammad Al-Mohsin

On Wed, Feb 18, 2015 at 12:57 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi
>
> On Tue, Feb 17, 2015 at 12:25 PM, <[email protected]>
> wrote:
>
>>
>> No form element found with 'id' = login, trying 'name'.
>> No form element found with 'name' = login
>>
>
> The form element for id is not 'login', it is 'username'.
> The form element for the password is 'password'
>
>
>> I was wondering also if *loginUrl* should be set to the url of the page
>> containing the auth form (*https://urs.earthdata.nasa.gov
>> <https://urs.earthdata.nasa.gov>*) or to the form action url where data
>> are actually posted (*https://urs.earthdata.nasa.gov/login
>> <https://urs.earthdata.nasa.gov/login>). *The documentation says (loginUrl
>> - the URL containing the actual <form>) but is it really the case?
>>
>
> Yes it is
>
>
>> I am using latest Nutch 1.10 trunk version that includes NUTCH-827v3
>> patch <https://issues.apache.org/jira/browse/NUTCH-827> on latest OS X
>> Yosemite (10.10.2).
>>
>
> Great, Please try my above suggestion and it will work.
>
>> Please let me know if I'm missing something!
>>
>>
>>
>> Thanks
> Lewis
>

Reply via email to