Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "AdvancedAjaxInteraction" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/AdvancedAjaxInteraction?action=diff&rev1=1&rev2=2 == Lets Begin with a Scenario == - xyz + So lets say that as a Nutch crawl administrator your client has tasked you with the following '''''"Get me domain specific material a database such as NTIS"''''' (NTIS; the National Technical Information Service, serves as the largest central resource for government-funded scientific, technical, engineering, and business related information available today.) + What this really translates to is the following: + * use Nutch to log in to a database which requires [[https://wiki.apache.org/nutch/HttpPostAuthentication|HTTP POST authentication]] + * follow the redirect to the database landing query form + * submit a query to the form which will return a ranked list of search results for the given query + * interpret the JavaScript for each result in the ranked list + * use an [[http://nutch.apache.org/apidocs/apidocs-1.9/index.html?org/apache/nutch/parse/HtmlParseFilter.html|HtmlParseFilter]] to obtain high level article/document content + * submit a GET request to invoke JavaScript which will return a PDF of the full textual content for this document + * return the full document (PDF) content and metadata along with the HTML parse filter data == Related Development Issues ==

