Philip-Daniel Beck created UIMA-3512:
----------------------------------------

             Summary: Add additional engine parameter for Ruta HtmlConverter to 
configure linebreak replacement.
                 Key: UIMA-3512
                 URL: https://issues.apache.org/jira/browse/UIMA-3512
             Project: UIMA
          Issue Type: Improvement
          Components: ruta
    Affects Versions: 2.1.1ruta
            Reporter: Philip-Daniel Beck
             Fix For: 2.1.1ruta


When converting an HTML file to plain text with HtmlConverter engine in Ruta, 
there exists an engine parameter "replaceLinebreaks" of type boolean to decide 
if text linebreaks should be replaced or not. If set to true, all linebreaks 
are kept in the document. If set to false, all linebreaks are deleted. 
Therefore, the last word of a line and the first word of the next line are put 
together without whitespace in between. It would often be better if a linebreak 
is replaced by a whitespace. To configure this, another engine parameter that 
defines the String, the linebreak is replaced with, would be useful.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to