To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=75525
                 Issue #|75525
                 Summary|Bad document correction -line breaks removal
               Component|Word processor
                 Version|1.0.0
                Platform|All
                     URL|
              OS/Version|All
                  Status|UNCONFIRMED
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|FEATURE
                Priority|P3
            Subcomponent|editing
             Assigned to|mru
             Reported by|tuharsky





------- Additional comments from [EMAIL PROTECTED] Mon Mar 19 14:49:40 +0000 
2007 -------
This is related to Issue 75524.

One option of "Bad Document Correcting Tool" could offer the intelligent removal
of unnecessarry line breaks, being the sub-option of general BDC Tool.

Purpose:
Some users type the text on PC the same way they did on mechanical typewriters
-they give a line break at the end of every line. Such document is impossible to
format, one must manually delete the line breaks. Moreover, if the document
suffered some printer-aided reformatting, the situation is even worse -You have
for example single line of text continuing on the next line (single word or a
few) and THEN suddenly the line break. Next line performs similary and so on.

I'm talking about the same effect as in the mail clients that put line breaks
automatically. Then You open the mail in other mail client, forward it etc. At
the end, You have the mentioned ugly corrupted formatting of text.

So, the option should offer a convenient way of automatical removal of such
mis-breaked lines. An algorithm is to be made to do the proper mis-breaked line
detection, for the start some simple set of rules could do:


1, The text section should be considered as "intended consistent", if there is
no empty line. Other words, even if the text contains line breaks, it is
considered as "should be consistent" if it dosen't contain ENTIRELY EMPTY line.
Other words, the text between two empty lines is considered as single consistent
block.
The "line mis-breaks" should be removed on this general basis, with more fine
tuned heuristics rules as follows:

2, The line is considered "intentionally ended (with line break that should
remain untouched)" if it's length is less than, say, 3/4 of the full line 
length.

3, If in the defined "intended consistent" block there are lines, that are just
a few (up to, say, 20) letters longer than full line length (so that just few
characters are in the next line and then ended with line break), it is
considered as probably line mis-break.

4, The lines, that contain bullets or numbering at the beginning, are considered
as intentionally (regulary) ended, thus the line break at the end of such line
should remain untouched.

5, If the whole line is based on different font than the majority of the
"intended consistent" block, the probability of line mis-break is smaller; the
line could also represent kinda header.


Please, add more rules if You wish.

In general, the function would analyse the text block, or whole document if
selected, and remove the line breaks that are suspected of being "unintentional"
or "mis-used".

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to