[
https://issues.apache.org/jira/browse/NUTCH-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-1209 started by Chris A. Mattmann.
> Output from ParserChecker Url missing a newline
> -----------------------------------------------
>
> Key: NUTCH-1209
> URL: https://issues.apache.org/jira/browse/NUTCH-1209
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.4
> Environment: While testing this:
> http://www.mail-archive.com/[email protected]/msg04688.html
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Priority: Trivial
> Fix For: 1.5
>
>
> While working on:
> http://www.mail-archive.com/[email protected]/msg04688.html
> I found out that the ParserChecker is missing a newline in its report.
> E.g., note:
> {noformat}
> ./bin/nutch org.apache.nutch.parse.ParserChecker
> http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
> {noformat}
> produces:
> {noformat}
> fetching: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
> parsing: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
> contentType: application/xhtml+xml
> ---------
> Url
> ---------------
> http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view---------
> ParseData
> ---------
> Version: 5
> ...snip
> {noformat}
> Note that there is no space between *view* and -----.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira