HtmlParser plugin - page title extraction -----------------------------------------
Key: NUTCH-750 URL: https://issues.apache.org/jira/browse/NUTCH-750 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Alexey Torochkov Priority: Minor Fix For: 1.1 A little improvement to trying to extract <title> tag in body if it doesn't exist in head. In current version DOMContentUtils just skip all after <body> in getTitle() method. Attached patch allows to change this behavior (for default it doesn't change anything) and can cope with webmasters mistakes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.