Re: Latest FOP schema

Peter B. West Mon, 13 May 2002 08:49:00 -0700

Joerg,

Comments below.

Joerg Pietschmann wrote:

>"Arved Sandstrom" <Arved_37@nnnn> wrote:
>  
>
>>I think the predominant opinion is (assume all of this fits on one page) -
>>
>>a normal block area (generated by the outer block) that contains:
>>
>>one or more line areas for "level_0_text fills to position A";
>>then a block area with one or more line areas for "level_1_text positioned
>>at A fills to position B";
>>finally more line areas for "more level_0_text positioned at B".
>>
>>Note that if your example had been
>>
>><fo:block>
>>        level_0_text fills to position A<fo:block>
>>                level_1_text positioned at A fills to position B
>>        </fo:block>more level_0_text positioned at B
>></fo:block>
>>
>>then it would still be the same.
>>    
>>
>
>As a side note, assuming western language and script and hyphenation
>off, if the example had been
>
> <fo:block>
>         level_0_text fills to position
>          A<fo:block>level_1_text positioned at A fills to position B
>         </fo:block>more level_0_text positioned at B
> </fo:block>
>
>it is probably illegal, according to 4.7.2, Point 3. I suppose
>it would be illegal to have a line break within the word
> "Alevel_1_text"
>here. The problem here is, where do I get the rules whether a line
>break is permitted somewhere for a certain language and script? And
>how is this supposed to deal with "out of context" stuff like product
>numbers or artificial DB keys or programming language identifiers
>containing underlines and dashes, and with non-breaking spaces, odd
>symbols, and character abuse (uppercase greek omega instead of Ohm
>sign)? Again, I suppose the burden has to be put on the user who
>has to ensure everything is correct, including changing the current
>language for quotes, nested if necessary, and specifying a language
>for product numbers and programming language ids. Umm, something
>looking like
>  ..., ISBN <fo:inline language="x-isbn">0-201-48345-9</fo:inline>...
>and
>  the <fo:inline language="x-Java">org.apache.fop.render.pdf.Font<fo:inline>
> class implements the <fo:inline
>  language="x-Java">org.apache.fop.layout.FontMetric<fo:inline>
> interface ...
>
>This would eleminate some keep-together stuff, I guess, but most
>probably requires a mechanism to teach the processor line breaking
>rules for user defined languages.
>
><DumbQuestions>
>- Is the interpretation reasonable? (I don't ask about correctness...:-)
>- Can the redesigned FOP deal with the "Alevel_1_text" above, I mean
>  will it raise an error or warning?
>- Can/should FOP deal with user supplied word/line breaking rules?
></DumbQuestions>
>
>Note that the same applies to the recently heavily discussed problem
>of a block level element inside an fo:inline, according to 4.7.3, in
>particular point 3.
>  
>

My take on this would be that the fo:block, by definition, breaks the 
line. The question of whether this is an allowable place for a line 
break is pre-empted by the user's assertion that it is. In these 
circumstances, point 3 does not come into play.

The larger question of where line breaks are allowed in specific 
languages and scripts is addressed by Unicode/ISO 10646. See, e.g., 
<http://www.unicode.org/Public/UNIDATA/UnicodeData.html>. Unicode 
characters are assigned character properties in the Unicode Character 
Database (UCD), the individual files of which are available under 
<ftp://www.unicode.org/Public/UNIDATA/> or 
<http://www.unicode.org/Public/UNIDATA/>. These cover such categories as 
Case, Numeric Value, Dashes, Line Breaking and Spaces. This will be the 
mechanism for teaching FOP how to handle line-breaking. Hyphenation will 
also, I imagine, have to take account of the UCD.

The "special languages" that you mentioned would probably be dealt with 
by taking advantage the sets of semantically differentiated characters 
which tend to share the same glyph. The most immediate example is the 
set of spacing characters which include U+0020 and U+00A0 non-breaking 
space. There is also a non-breaking hyphen, and raised dots with 
different semantics. So my bottom line there would be that, at least for 
the immediate future, we should concentrate on implementing the UCD as 
fully as possible where it impacts on layout, and let users with special 
requirements work out how to express them in Unicode.

I have a copy of Version 2.0 of The Unicode Standard, and only regret 
not having Version 3. I can heartily recommend the book, if only for the 
pleasure of the fonts.

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Latest FOP schema

Reply via email to