[
https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161061#comment-16161061
]
ASF GitHub Bot commented on NUTCH-2397:
---------------------------------------
sebastian-nagel closed pull request #198: NUTCH-2397: Parser to add paragraph
line breaks
URL: https://github.com/apache/nutch/pull/198
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Parser to add paragraph line breaks
> -----------------------------------
>
> Key: NUTCH-2397
> URL: https://issues.apache.org/jira/browse/NUTCH-2397
> Project: Nutch
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.3.1, 1.13
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 2.4, 1.14
>
>
> (initially reported with patch/pull-request by Vipul Behl, see
> [#190|https://github.com/apache/nutch/pull/190])
> The parser (parse-tika and parse-html) could be improved to add line breaks
> between paragraphs, instead of writing the whole document into a single line.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)