Re: Status of UAX#14?

Manuel Mall Mon, 27 Feb 2006 06:03:33 -0800

On Monday 27 February 2006 21:33, Jeremias Maerki wrote:
> On 27.02.2006 12:36:58 Manuel Mall wrote:
> > On Monday 27 February 2006 18:55, Jeremias Maerki wrote:
> > > What's the status of UAX#14? Does anybody have had time to work
> > > on that, yet? I'm asking because I'm considering hacking in
> > > support for the fixed width spaces (U+2000..U+200A). One of my
> > > clients asks for that but I can't allocate enough time right now
> > > to do the whole thing, unfortunately.
> >
> > I don't think UAX#14 will happen in a hurry.  However in
> > http://wiki.apache.org/xmlgraphics-fop/LineBreaking I do describe
> > possible handling of fixed width spaces. The main decision, and
> > that has little to do with UAX#14 is if these spaces are to be
> > treated like white space when it comes to linebreaks or like
> > non-breakable spaces. If one follows the XSL-FO spec to the letter
> > these spaces are not white space and therefore are not removed
> > around a line break. I have no idea what actual user expectations
> > are when it comes to these spaces. Would authors (especially in non
> > english / latin languages) expect these spaces to be removed around
> > a linebreak or not? The relevant Knuth sequences which need to be
> > generated depend on that decision: Is the space removable or not
> > when a break occurs?
>
> I think we're talking about two different removals here, right? Once
> it's about the FO white-space-affecting properties. Here's where I
> think that these do not affect special Unicode spaces (only XML white
> space, see below). When we're talking about line-breaking I think the
> space that makes up the break possibility is removed (except in the
> case of tagged PDF where the space will need to be preserved for the
> structure info) but not any of the other "special" spaces in the
> vicinity. At least, that would be my expectation and my
> interpretation.
>


Removal of spaces around formatter line breaks is also covered by the 
spec. The property suppress-at-line-break controls it. And check its 
definition of "auto". The fixed width spaces are explicitly excluded. 
So, contrary to my initial post there is no ambiguity in the spec. 
Fixed width spaces are not removed unless the user explicitly sets the 
suppress-at-line-break property. As we do not yet support the 
suppress-at-line-break property the only Knuth sequences which need to 
be generated are for non-elastic, non-removable spaces. That should be 
reasonably straight forward.

Interestingly enough this means the default behaviour of 
suppress-at-line-break is that independent of any other white space 
handling properties U+0020 (space) is always(!) removed around 
formatter generated line breaks. Need to think about that a bit more.

> I've just gone through the FO spec again searching for "white" and it
> seems clear to me that the spec makes a rather clear distinction when
> white-space in terms of the XML spec is meant or when general
> white-space is meant.
>
> > I am also uncertain how these spaces interact with line
> > justification. They are by definition not elastic. So if you have a
> > fixed width space only between two words this is not an inter word
> > gap that can be used for justification.
>
> Yes.
>
> > Therefore any calculations which rely on knowing the
> > number of words on a line to determine how many inter word gaps we
> > have to then calculate the per gap justification amount will need
> > to be adjusted to not count inter word gaps which only contain
> > fixed width spaces. On the other hand they are still word
> > boundaries for the purpose of finding words for hyphenation.
>
> Yes.
>
> But is there really a problem when it comes to adjusting inter-word
> gaps because that's already handled by the right element list for all
> the different cases, right? At least, I don't see where exactly
> you're uncertain. The fixed width spaces just don't have any
> stretch/shrink they contribute to inter-word gaps.
>

Yes, the Knuth algorithm will take the stretch/shrink into account when 
doing its optimal line breaking but it will not tell you what the final 
inter word gap is. That is I think separately computed based on the 
number of words found with some fine tuning. This is where you may (or 
may not) run into trouble.

> I'll look into the fixed width spaces. So, thanks for your fast
> answer and the valuable pointer to the Wiki. In case I don't manage
> to do this cleanly, the least I can do is make sure we don't get ugly
> "#" in the output because the renderers don't know about the special
> spaces. This will also help for when someone has time to go towards
> UAX#14.
>
> Jeremias Maerki

Manuel

Re: Status of UAX#14?

Reply via email to