Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

Manuel Mall Mon, 06 Feb 2006 16:11:32 -0800

On Tuesday 07 February 2006 01:11, Andreas L Delmelle wrote:
> On Feb 6, 2006, at 08:17, Manuel Mall wrote:
> >> [ME:]
> >
> > <snip/>
> >
> >> A preserved carriage return can be treated the same way as a
> >> linefeed, under the very exceptional condition that it survives
> >> white-
> >> space handling:
> >>   * white-space-treatment="ignore-if-*"
> >>   * the CR does not follow/precede a linefeed
> >>   * it is the first character in a sequence of whitespace, so
> >>     it survives white-space-collapse
> >
> > Shouldn't a CR always survive whitespace handling?
>
> Not always:
> If white-space-treatment="preserve" then any XML whitespace other
> than a linefeed is converted into a normal space. IMO, the editors
> put it this way because of the possibility of Windows-specific line-
> endings, where a linefeed is followed by a CR.
>
> > For a starters it is fairly difficult to get a CR out of a XML
> > parser.
>
> Difficult? It's simply a characters event, just like any other...
>


From the XML spec:

<quote>
S (white space) consists of one or more space (#x20) characters, 
carriage returns, line feeds, or tabs.
White Space
[3]     S          ::=          (#x20 | #x9 | #xD | #xA)+

Note:

The presence of #xD in the above production is maintained purely for 
backward compatibility with the First Edition. As explained in 2.11 
End-of-Line Handling, all #xD characters literally present in an XML 
document are either removed or replaced by #xA characters before any 
other processing is done. The only way to get a #xD character to match 
this production is to use a character reference in an entity value 
literal.

...

2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for 
editing convenience, are organized into lines. These lines are 
typically separated by some combination of the characters CARRIAGE 
RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as 
if it normalized all line breaks in external parsed entities (including 
the document entity) on input, before parsing, by translating both the 
two-character sequence #xD #xA and any #xD that is not followed by #xA 
to a single #xA character.
<quote/>

To  me this means unless you define an entity <!ENTITY cr "&#xD;" > and 
then later reference it as &cr; you never get a CR out of an XML parser 
(even on Windows).

>
> Cheers,
>
> Andreas

Regards

Manuel

Re: DO NOT REPLY [Bug 38507] - Non-breaking space in PDF title output

Reply via email to