Hi I'm in the later stages of classified ad pagination system that uses Apache FOP for its layout backend.
I'm running into issues where long URLs aren't getting broken across lines when they're too long to fit on a line. They're just spilling out of the i-p-d of the block they're contained by and overlaying text in the next column. I'm using fop 1.0 and for testing am using only the built-in fonts. Output is direct to PDF from fop, though the same issue can be reproduced with PostScript output. Links to test case files are at the bottom of this email. This does not happen with other very long words, eg ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate, which is hyphenated after "Word" in my test and prints just fine. In the area tree the problem URLs look normal and aren't annotated with any sort of special markup, but they aren't broken between lines even when they wouldn't fit. Fop would know this by the time it output the area tree because it has the required font and device metrics. I'm using fop 1.0. The issue is not specific to any particular font and affects the default sans-serif font as well as the font I'm using, Myriad Pro SemiCondensed . The samples linked to below have been generated using only fop's built-in fonts to make sure it's easy for others to run them, so please forgive the crappy appearance. As you can see in the sample PDF, the URLs www.health4all.greatshapetoday.com.au (p1, 3rd col, 2nd below "health and beauty" heading) and www.perthrentalapartments.com.au (p2, 1st col, 5th below "to let" heading) are overflowing their available i-p-d. The area tree output for these URLs is the same as any other non-problem line: <lineArea><text><word offset="0">the.long.url.</word></text></lineArea> I haven't declared anything special about these URLs and would expect them to be either broken or "squished" into the line, preferably the latter (though I don't think fop supports horizontal scaling of text yet). I'm wondering if fop is detecting that these are URLs and applying some special formatting rules, since they seem to be hyperlinked in Acrobat. Is it that, or are URLs something the hyphenation/breaks/layout algorithm just doesn't cope well with? Has anyone else run into similar issues here? If so, found any workarounds or solutions? For now I'm probably going to look for long URLs in the input text and add zero-width spaces (or some similar nonprinting char) to them at promising looking points, so fop has something to break on. I'd love a better solution to what must be a relatively common problem, though. In case it matters, the problem URLs are in a <fo:block> in cells of a 1-col table flowed into columns. The text of interest is located in a path like this: fo:root/fo:page-sequence/fo:flow/fo:block[id=columnInnerBlock] /fo:table/fo:table-body/fo:table-row/fo:table-cell/fo:block The outer fo:block[id=columnInnerBlock] is used to generate some borders around the columns, it only contains tables and never any raw text content. XSL-FO source, please ignore missing image from header as it doesn't affect the layout: http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.xml The problem areas are line 387 and line 1003, the URLs noted above. PDF generated with "fop test.xsl test.pdf". I usually use an embedded fop instance that generates an area tree which I post-process and feed back into fop, but for the purposes of this test case I've used a standalone fop. http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.pdf -- Craig Ringer --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
