Sebastian Nagel created NUTCH-2397:
--------------------------------------

             Summary: Parser to add paragraph line breaks
                 Key: NUTCH-2397
                 URL: https://issues.apache.org/jira/browse/NUTCH-2397
             Project: Nutch
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.13, 2.3.1
            Reporter: Sebastian Nagel
            Priority: Minor
             Fix For: 2.4, 1.14


(initially reported with patch/pull-request by Vipul Behl, see 
[#190|https://github.com/apache/nutch/pull/190])

The parser (parse-tika and parse-html) could be improved to add line breaks 
between paragraphs, instead of writing the whole document into a single line.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to