Glen Mazza wrote:I'm not sure why it wouldn't--as a whitespace removal algorithm should be able to take into account line breaks as well. But even if doesn't account for linebreaks, you should still see a reduction in the number of TLM instances created, as the FOText instances white-space remove themselves into extinction. It's just that the reduction would not be as large as desired.
A further optimization might be to do all this before the Block is even parsed into FOText and Inline objects, as many spaces-only objects would end up not even needing to be created.
This will not account for spaces to be removed around line breaks.
But then, proper TR14 line breaking needs
a precious character LB property and a whitespace status
too, so this can be combined.
I'm not sure what you're referring to here--the TR at http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention a "whitepace status" or LB "property" per se. But I believe this is minor to your point below.
The processing would be roughly as follow:
*for* word *in* text (separated by whitespace) normalize the whitespace (optimize normalization away for some whitespace status).
Hmmm...not that big a deal to me, but I would be inclined to keep the whitespace removal out of the LayoutManagers, because it is fo:block specific (depending on the whitespace removal property) as to whether or not to even remove whitespace to begin with. It would be appear ideal to keep this business logic out of the Layout Manager classes--instead just send it whitespace-normalized (or not normalized, depending on the removal property) text, and have TLM process either equivalently.
Another issue, maybe just hairsplitting in this case, is that if it is a "word" that you're extracting in your for-loop, you can't subsequently normalize the whitespace around it, because, by definition, you've just taken a "word". To generalize what you're saying, I think you mean, "each word with assorted whitespace around it"--but that may be tough to precisely define within a for-loop.
This seems to make sense. (Although this TR is rather sleep-inducing for me, at least--we may need to have someone else implement it! ;)calculate TR14 breaks at the beginning of the word *for* TR14 break possiblities *in* word *if* line full check hyphenations return previous break possiblity *end for* *end for*