Hi Craig,

I'd think the main reason is, the hyphenation rules are based on words. And 
while ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate may be a word, 
www.health4all.greatshapetoday.com.au most likely is not. If you have a limited 
set of URLs and a limited set of languages, you could extend the hyphenation 
rules. Or you could add zwsp before and after dots. But even greatshapetoday is 
no known English word and therefore most likely not covered by hyphenation 
rules.

Regards,

Georg Datterl

------ Kontakt ------

Georg Datterl

Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg

HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20

www.geneon.de

Weitere Mitglieder der Willmy MediaGroup:

IRS Integrated Realization Services GmbH:    www.irs-nbg.de
Willmy PrintMedia GmbH:                      www.willmy.de
Willmy Consult & Content GmbH:               www.willmycc.de

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:[email protected]]
Gesendet: Donnerstag, 19. Januar 2012 03:27
An: [email protected]
Betreff: Long URLs appear exempt from word wrap, overflow i-p-d into next column

Hi

I'm in the later stages of classified ad pagination system that uses Apache FOP 
for its layout backend.

I'm running into issues where long URLs aren't getting broken across lines when 
they're too long to fit on a line. They're just spilling out of the i-p-d of 
the block they're contained by and overlaying text in the next column.

I'm using fop 1.0 and for testing am using only the built-in fonts.
Output is direct to PDF from fop, though the same issue can be reproduced with 
PostScript output. Links to test case files are at the bottom of this email.

This does not happen with other very long words, eg 
ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate, which is hyphenated after 
"Word" in my test and prints just fine.

In the area tree the problem URLs look normal and aren't annotated with any 
sort of special markup, but they aren't broken between lines even when they 
wouldn't fit. Fop would know this by the time it output the area tree because 
it has the required font and device metrics.

I'm using fop 1.0. The issue is not specific to any particular font and affects 
the default sans-serif font as well as the font I'm using, Myriad Pro 
SemiCondensed . The samples linked to below have been generated using only 
fop's built-in fonts to make sure it's easy for others to run them, so please 
forgive the crappy appearance.

As you can see in the sample PDF, the URLs
  www.health4all.greatshapetoday.com.au
(p1, 3rd col, 2nd below "health and beauty" heading) and
  www.perthrentalapartments.com.au
(p2, 1st col, 5th below "to let" heading) are overflowing their available i-p-d.

The area tree output for these URLs is the same as any other non-problem
line:

<lineArea><text><word offset="0">the.long.url.</word></text></lineArea>

I haven't declared anything special about these URLs and would expect them to 
be either broken or "squished" into the line, preferably the latter (though I 
don't think fop supports horizontal scaling of text yet).

I'm wondering if fop is detecting that these are URLs and applying some special 
formatting rules, since they seem to be hyperlinked in Acrobat.
Is it that, or are URLs something the hyphenation/breaks/layout algorithm just 
doesn't cope well with?

Has anyone else run into similar issues here? If so, found any workarounds or 
solutions?

For now I'm probably going to look for long URLs in the input text and add 
zero-width spaces (or some similar nonprinting char) to them at promising 
looking points, so fop has something to break on. I'd love a better solution to 
what must be a relatively common problem, though.


In case it matters, the problem URLs are in a <fo:block> in cells of a 1-col 
table flowed into columns. The text of interest is located in a path like this:

fo:root/fo:page-sequence/fo:flow/fo:block[id=columnInnerBlock]
  /fo:table/fo:table-body/fo:table-row/fo:table-cell/fo:block

The outer fo:block[id=columnInnerBlock] is used to generate some borders around 
the columns, it only contains tables and never any raw text content.


XSL-FO source, please ignore missing image from header as it doesn't affect the 
layout:

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.xml

The problem areas are line 387 and line 1003, the URLs noted above.

PDF generated with "fop test.xsl test.pdf". I usually use an embedded fop 
instance that generates an area tree which I post-process and feed back into 
fop, but for the purposes of this test case I've used a standalone fop.

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.pdf

--
Craig Ringer

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to