On Jan 15, 2007, at 21:25, J.Pietschmann wrote:

Andreas L Delmelle wrote:
BTW: I took a very quick look, and does anyone know if there is a good reason why Hyphenation.word is a String?

The hyphenator  interface goes through several wrapping layers,
probably due to the usual "take working code and wrap it to fit
the caller" method.

Looks that way...
Traced it down, and in TextLM.getWordChars() we get

  sbChars.append(new String(textArray, ai.iStartIndex,
                          ai.iBreakIndex - ai.iStartIndex));

Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...

Seen that every String ultimately has a backing char[](*) anyway, I'd say that we can safely return the copy, and remove the overhead of

StringBuffer.append(new String(char[])).toString().toCharArray()

Hmmm... Put it like that, and this would almost be one for the Daily WTF! 8-)

(*) which BTW, answers the question about the char[] instances being twice that of the text-nodes in the document in the snapshot posted by Richard earlier on in the thread about memory issues. Sure, there are some 39K text-nodes in the document, but there are most likely at least as many non-internalized property values (cfr. the number of String instances)...

This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks). Unfortunately,
there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly

A quick-and-dirty hack to make the Hyphenator return a Hyphenation as I described earlier on --hyph-point for the SHY and the rest as two separate hyphenated words-- doesn't seem too hard to pull off, but it would be an exception for the SHY only. For a more comprehensive approach, I currently don't know enough about hyphenation basics, I'm afraid...



Reply via email to