Philip-Daniel Beck created UIMA-3512:
----------------------------------------
Summary: Add additional engine parameter for Ruta HtmlConverter to
configure linebreak replacement.
Key: UIMA-3512
URL: https://issues.apache.org/jira/browse/UIMA-3512
Project: UIMA
Issue Type: Improvement
Components: ruta
Affects Versions: 2.1.1ruta
Reporter: Philip-Daniel Beck
Fix For: 2.1.1ruta
When converting an HTML file to plain text with HtmlConverter engine in Ruta,
there exists an engine parameter "replaceLinebreaks" of type boolean to decide
if text linebreaks should be replaced or not. If set to true, all linebreaks
are kept in the document. If set to false, all linebreaks are deleted.
Therefore, the last word of a line and the first word of the next line are put
together without whitespace in between. It would often be better if a linebreak
is replaced by a whitespace. To configure this, another engine parameter that
defines the String, the linebreak is replaced with, would be useful.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)