On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir <[email protected]> wrote: > On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies <[email protected]> wrote: >> In Japanese, compounds are just decompositions of the input string. In >> other languages, compounds can manufacture entire tokens from thin >> air. In those cases, it's something of a question how to decide on the >> offsets. I think that you're right, eventually, insofar as there's >> some offset in the original that might as well be blamed for any given >> component. >> > > Why change the offsets then? Offsets are for highlighting. Let the > whole compound be highlighted when its a match in search results. Its > transparent and totally accurate as to what is happening: this is why > we do highlighting, to aid the user can make a relevance assessment > about the document, not to try to assist the end user to debug the > analysis chain or anything like that.
Thanks, that's very helpful. I spend all my time crawling around the underside of this stuff and I lack perspective. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
