Re: [l10n-dev] Re: [gsl-dev] BreakIterator and Hyphenation

Eike Rathke Wed, 21 Sep 2005 11:54:48 -0700

Hi Rajeev,

On Wed, Sep 14, 2005 at 17:56:55 +0200, Christof Pintaske forwarded:

> Rajeev J Sebastian wrote:
> >I am trying to implement a BreakIterator/Hyphenator for my script.

Btw, which script is it?

> >The "hyphenator" is purely algorithmic, and doesn't require any
> >external dictionaries. So I though of reimplementing getLineBreak.

Why would you want to reimplement getLineBreak()? Isn't implementing and
overloading makeIndex() sufficient? What makes it different from the
approach used in BreakIterator_hi::makeIndex() of breakiterator_hi.cxx,
for example?

> >Although I have successfully subclassed BreakIterator_CTL, I am not able 
> >to return the right results. When I set the LineBreakResults.breakType to 
> >BreakType::Hyphenation, OOo (1.9.116) just crashes.

If you do it similar to what is done in
BreakIterator_CTL::getLineBreak() this sounds more like not the
assignment to LineBreakResults.breakType is crashing, but accessing the
previousCellIndex[] array is out of bounds instead.

What do you do exactly?

> >The value in which it 
> >works is BreakType::WordBoundary, but this always breaks on a word 
> >boundary and never at the correct hyphenation point.
> >
> >I am assuming that it is because I don't correctly initialize the 
> >lbr.rHyphenatedWord.
> >
> >At this point, I am stuck. What value do I copy into the rHyphenatedWord ? 

This doesn't sound like you reimplemented
BreakIterator_CTL::getLineBreak() but
BreakIterator_Unicode::getLineBreak() instead. To me this sounds wrong,
but you may have your reasons for doing so. What are you trying to
achieve?

Note that LineBreakResults.rHyphenatedWord is a reference to an
_interface_ type of CSS::linguistic2::XHyphenatedWord, not a string or
such. This is normally handled in BreakIterator_Unicode::getLineBreak()
called by BreakIterator_CTL::getLineBreak(). Here again my question: why
would you want to reimplement everything?

> >In my script, there is no hyphen to denote a line break; the line just 
> >breaks after the grapheme. So do I insert ZWSP at the appropriate position 
> >or some other Unicode character ? How would I do that in code ? 

You don't. BreakIterator_CTL::getLineBreak() and
BreakIterator_Unicode::getLineBreak() take care of that, and if you just
implement makeIndex() correctly and then assign the previous cell
boundary to LineBreakResults.breakIndex it should work out of the box.
Otherwise please be more specific why you want to reimplement everything.

> >An example code to copy the word and insert the appropriate break 
> >character at any position would be greatly appreciated.

You don't copy the word and insert characters. The application decides
according to the LineBreakResults what it has to do at the position of
the linebreak.

> >Also if possible, a link to the API documentation for BreakIterator would 
> >be very helpful.

http://api.openoffice.org/docs/common/ref/com/sun/star/i18n/XBreakIterator.html

  Eike

-- 
 OOo/SO Calc core developer. Number formatter bedevilled I18N transpositionizer.
 GnuPG key 0x293C05FD:  997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] Re: [gsl-dev] BreakIterator and Hyphenation

Reply via email to