I'm now using the excellent Hightlighter from within Solr and it works very well; except that the generated fragments sometimes begins with bad-looking characters (the "." of the end of the previous phrase, or a ), /10, etc). The same is true for the fragments ends. I looked at both the dev and user lucene list in search for a better Fragmenter class, but it seems that there's none right now (just the simple and null fragmenters).

To me the 'simple' fragmenter is a bit too simple; anyone had success in implementing a more intelligent one? I have no java coding experience, sadly, so I don't know where to begin on this one. I don't think fancy phrase recognition is needed; just a better boundary algorithm (avoid beginning / ending fragments with bad looking characters) and the addition of "..." at the end and beginning of the fragment if fragmentation of a phrase took place.

Also, is it required that the highlighted field is 'stored'? I'm pretty sure it is, but just want confirmation.

Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to