Sebastian Nagel created NUTCH-2397:
--------------------------------------
Summary: Parser to add paragraph line breaks
Key: NUTCH-2397
URL: https://issues.apache.org/jira/browse/NUTCH-2397
Project: Nutch
Issue Type: Improvement
Components: parser
Affects Versions: 1.13, 2.3.1
Reporter: Sebastian Nagel
Priority: Minor
Fix For: 2.4, 1.14
(initially reported with patch/pull-request by Vipul Behl, see
[#190|https://github.com/apache/nutch/pull/190])
The parser (parse-tika and parse-html) could be improved to add line breaks
between paragraphs, instead of writing the whole document into a single line.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)