Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

Glen Mazza Sat, 17 Apr 2004 16:10:04 -0700

Hello Joerg, welcome back.

J.Pietschmann wrote:

Glen Mazza wrote:

A further optimization might be to do all this before
the Block is even parsed into FOText and Inline
objects, as many spaces-only objects would end up not
even needing to be created.

This will not account for spaces to be removed around line
breaks.

I'm not sure why it wouldn't--as a whitespace removal algorithm should be able to take into account line breaks as well. But even if doesn't account for linebreaks, you should still see a reduction in the number of TLM instances created, as the FOText instances white-space remove themselves into extinction. It's just that the reduction would not be as large as desired.

But then, proper TR14 line breaking needs a precious character LB property and a whitespace status too, so this can be combined.

I'm not sure what you're referring to here--the TR at http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention a "whitepace status" or LB "property" per se. But I believe this is minor to your point below.

The processing would be
roughly as follow:

*for* word *in* text (separated by whitespace)
   normalize the whitespace (optimize normalization away
    for some whitespace status).

Hmmm...not that big a deal to me, but I would be inclined to keep the whitespace removal out of the LayoutManagers, because it is fo:block specific (depending on the whitespace removal property) as to whether or not to even remove whitespace to begin with. It would be appear ideal to keep this business logic out of the Layout Manager classes--instead just send it whitespace-normalized (or not normalized, depending on the removal property) text, and have TLM process either equivalently.

Another issue, maybe just hairsplitting in this case, is that if it is a "word" that you're extracting in your for-loop, you can't subsequently normalize the whitespace around it, because, by definition, you've just taken a "word". To generalize what you're saying, I think you mean, "each word with assorted whitespace around it"--but that may be tough to precisely define within a for-loop.

   calculate TR14 breaks at the beginning of the word
   *for* TR14 break possiblities *in* word
     *if* line full
        check hyphenations
        return previous break possiblity
   *end for*
 *end for*

This seems to make sense. (Although this TR is rather sleep-inducing for me, at least--we may need to have someone else implement it! ;)

Glen

Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

Reply via email to