On Jan 15, 2007, at 21:25, J.Pietschmann wrote:
Andreas L Delmelle wrote:
BTW: I took a very quick look, and does anyone know if there is a
good reason why Hyphenation.word is a String?
The hyphenator interface goes through several wrapping layers,
probably due to the usual "take working code and wrap it to fit
the caller" method.
Looks that way...
Traced it down, and in TextLM.getWordChars() we get
sbChars.append(new String(textArray, ai.iStartIndex,
ai.iBreakIndex - ai.iStartIndex));
Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...
Seen that every String ultimately has a backing char[](*) anyway, I'd
say that we can safely return the copy, and remove the overhead of
StringBuffer.append(new String(char[])).toString().toCharArray()
Hmmm... Put it like that, and this would almost be one for the Daily
WTF! 8-)
(*) which BTW, answers the question about the char[] instances being
twice that of the text-nodes in the document in the snapshot posted
by Richard earlier on in the thread about memory issues. Sure, there
are some 39K text-nodes in the document, but there are most likely at
least as many non-internalized property values (cfr. the number of
String instances)...
This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks).
Unfortunately,
there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly
welcome.
A quick-and-dirty hack to make the Hyphenator return a Hyphenation as
I described earlier on --hyph-point for the SHY and the rest as two
separate hyphenated words-- doesn't seem too hard to pull off, but it
would be an exception for the SHY only. For a more comprehensive
approach, I currently don't know enough about hyphenation basics, I'm
afraid...
Cheers,
Andreas