On Thu, 3 Nov 2005 06:03 am, J.Pietschmann wrote: > Manuel Mall wrote: > > a) Yes UAX#14 always breaks at the of a sequence of spaces > > b) But is also says that it assumes any trailing spaces in a line > > are being removed > > This "conflicts" with XSL-FO which can force spaces being retained > > therefore adjustments to the algorithm are necessary to cater for > > that. > > Computing line breaking opportunities and discarding whitespace at > the end (or beginning) of a line are different matters. If whitespace > has to be retained, trailing spaces after a non-space string may > simply mean the previous line breaking opportunity has to be used, > because otherwise the string including the trailing spaces will > overflow the line area. The trailing whitespace may also influence > text justification. > Hmm, to me it appears that UNICODE and XSL-FO have slightly different models when it comes to white space in the context of line breaking which is causing the discussion here. In UNICODE everything is based simply on the properties of the codepoint in question and its neighbour. In XSL-FO one can change the behaviour of a codepoint by setting those white space related XSL-FO properties. That is not a concept within UNICODE. If you want to retain white space in UNICODE you use a different codepoint. If you want to retain a space in XSL-FO you could use a different codepoint but more likely you set a XSL-FO property if you want this applied widely in your document.
If we want to 'marry' UNICODE linebreaking with XSL-FO white space handling we have this interaction to consider. One possible solution would be to replace spaces (U+0020) by different codepoints which resemble the behaviour modification imposed by any XSL-FO white space handling properties in effect. But I am not sure if this can be done in all cases. Otherwise we may have to modify the UNICODE line breaking algorithm to cater for the XSL-FO white space specialities. > J.Pietschmann Manuel
