Hi, I am trying to develop a custom crawler to crawl websites that require form based authentication using Nutch v1.9 in Java. The HttpPostAuthentication feature of Nutch is followed to implement it.
The login parameters required for authentication such as html form-id, login post data(username, password) are specified as key-value pairs in a configuration file. What is required to identify the html login form(id or name of the html form)? Thanks, Tizy On Wed, Dec 10, 2014 at 3:46 PM, <[email protected]> wrote: > Hi! This is the ezmlm program. I'm managing the > [email protected] mailing list. > > Acknowledgment: The address > > [email protected] > > was already on the dev mailing list when I received > your request, and remains a subscriber. > > > --- Administrative commands for the dev list --- > > I can handle administrative requests automatically. Please > do not send them to the list address! Instead, send > your message to the correct command address: > > To subscribe to the list, send a message to: > <[email protected]> > > To remove your address from the list, send a message to: > <[email protected]> > > Send mail to the following for info and FAQ for this list: > <[email protected]> > <[email protected]> > > Similar addresses exist for the digest list: > <[email protected]> > <[email protected]> > > To get messages 123 through 145 (a maximum of 100 per request), mail: > <[email protected]> > > To get an index with subject and author for messages 123-456 , mail: > <[email protected]> > > They are always returned as sets of 100, max 2000 per request, > so you'll actually get 100-499. > > To receive all messages with the same subject as message 12345, > send a short message to: > <[email protected]> > > The messages should contain one line or word of text to avoid being > treated as sp@m, but I will ignore their content. > Only the ADDRESS you send to is important. > > You can start a subscription for an alternate address, > for example "[email protected]", just add a hyphen and your > address (with '=' instead of '@') after the command word: > <[email protected]> > > To stop subscription for this address, mail: > <[email protected]> > > In both cases, I'll send a confirmation message to that address. When > you receive it, simply reply to it to complete your subscription. > > If despite following these instructions, you do not get the > desired results, please contact my owner at > [email protected]. Please be patient, my owner is a > lot slower than I am ;-) > > --- Enclosed is a copy of the request I received. > > Return-Path: <[email protected]> > Received: (qmail 17077 invoked by uid 99); 10 Dec 2014 10:16:55 -0000 > Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) > by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2014 10:16:55 > +0000 > X-ASF-Spam-Status: No, hits=1.7 required=5.0 > > tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS > X-Spam-Check-By: apache.org > Received-SPF: pass (nike.apache.org: domain of [email protected] > designates 209.85.215.44 as permitted sender) > Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) > (209.85.215.44) > by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2014 10:16:28 > +0000 > Received: by mail-la0-f44.google.com with SMTP id gd6so2117829lab.31 > for <dev-sc.1418206086.jienlahiinfjaegkjhgn-tizy1307= > [email protected]>; Wed, 10 Dec 2014 02:15:42 -0800 (PST) > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > d=gmail.com; s=20120113; > > h=mime-version:in-reply-to:references:date:message-id:subject:from:to > :content-type; > bh=bBTQ7/z6qwtlfNMQShZx70BbR0q88DGcwrwDmV5xH44=; > > b=PUuDMXmWlTDAoDpLCPuVhlIxdxpHsFJDgmiO9rDoO/RDm50gD5Owva4hgbP8Toqw/L > > dTlRUq/N4Zdgbt7RR5pcktGqX4uFUfhvD8EmFDZKLmFFu9Vvnntpb2PCm8sujtGzrLGC > > bGVfopdDu67oCAPMZQaFZNBibR6bTZtBVSo5SlqaJTFolFPYTEpgqKtfljoKQYw2wYWG > > fMQTTK659jUl8MzQRtF2OaYDJ0qKnq27GpLZg5V49VtC/+loC5rrSpHK9lx6xefbbRyi > > gL9PpkQ2P1Lpj8MDcsGlMCmsUnGe7U3wQg7bGNseG+1ALHaYirLbfqVTU+CIDsQIA0E0 > okAQ== > MIME-Version: 1.0 > X-Received: by 10.112.201.72 with SMTP id jy8mr3139001lbc.65.1418206542265; > Wed, 10 Dec 2014 02:15:42 -0800 (PST) > Received: by 10.112.126.198 with HTTP; Wed, 10 Dec 2014 02:15:42 -0800 > (PST) > In-Reply-To: <CAKgNBJNFkw96BFEzdPNfU8cHztmuTWvQnmVjRP8F=8z2_h== > [email protected]> > References: <[email protected]> > <CAKgNBJNFkw96BFEzdPNfU8cHztmuTWvQnmVjRP8F=8z2_h== > [email protected]> > Date: Wed, 10 Dec 2014 15:45:42 +0530 > Message-ID: < > cakgnbjmfqvt6ooqvjptwe0nxumsazzvwdv-qnkd+kgqkrgm...@mail.gmail.com> > Subject: Re: confirm subscribe to [email protected] > From: Tizy Ninan <[email protected]> > To: dev-sc.1418206086.jienlahiinfjaegkjhgn-tizy1307= > [email protected] > Content-Type: multipart/alternative; boundary=001a11c372da2b83b70509d9f1fd > X-Virus-Checked: Checked by ClamAV on apache.org > >

