Hello all, I tracked down the bugs 10374, 2106 and 6042. The last bug was caused by a simple, easy to fix mistake in the hyphenation framework. The bug 10374 is unfortunately a duplicate of 2106, not 6042, and a bit more interesting. It is caused by the parser delivering character references as a separate character chunk, thereby creating multiple FOText children of the block (FObjMixed) for consecutive text. This interferes badly with line breaking and hyphenation. Take extensible with room up to the "l" on the line. This is split into three FOText objects e x tensible The text is delivered separately to the line layout algorithm. The "e" and "X" do not fill the line but also are not words and are appended to the pendingAreas vector. The "tensible" then overflows the line and is passed to the hyphenation, lets say it is hyphenated as "tensi-ble". The "tensi-" is appended without flushing the pending areas, which are put first into the next line. I put a StringBuffer into FObjMixed to accumulate consecutive addCharacters() events. This fixes the problem with character references, but not e<fo:inline>X</fo:inline>tensible (also noted somewhere in bugzilla as problem) The second is to flush pendig areas in addWord(). This fixes the lost characters problem but *still* does not correctly hyphenate words split into inline FOs, only the chunk actually overflowing the line is considered for hyphenation.
More problems I noted: - white space is handled inconsistently - line break detection relies on white space only - word detection for hyphenation relies on white space and wrongly assumes there is a white space before the word passed to doHyphenation() - the LinkSet is not considered for hyphenated word parts in addWord, and neither for page-number-citation nor fo:character - same for most of overlining, line through and vertical alignment - characters are copied to FOText, and then copied *twice* in LineArea.layout(), one purely for hyphenation. During Layout, character data is at least three times, possibly four times (parser buffer) in memory Questions: - Is it still worth to do major hacks in LineArea.java? - Should we consider using Unicode break properties for line break opportunity detection? - How should words for hyphenation be detected? - What happens to line breaks and word detection in case of * inline graphics and other definitely non-text inlines * inline foreign elements, like formulas * inline-containers containing blocks, especially blocks with text only - Are there script or language dependencies to consider for line break and word detection? - At which point should collapse-whitespace, linefeed-treatment etc. considered? Possibilities: * while creating FOText * while feeding it into the line area * during line area layout Considering white-space-collapse during FOText creation has some problems in case of successive spaces in different inline FO. There are additional issues with consecutive spaces which had been discussed here already, in particular how foo <fo:inline text-decoration="underline"> bar</fo:inline> should be handled. Will this result in two consecutive spaces, one of them underlined? Has this issue been resolved meanwhile? J.Pietschmann --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]