Hi,

On Wed, Sep 2, 2009 at 2:40 PM, David Causse<[email protected]> wrote:
> If I use tika for parsing HTML code and inject parsed String to a lucene
> analyzer. What about the offset information for KWIC and return to text
> (like the google cache view)? how can I keep track of the offsets
> between tika parser and lucene analyzer?

Currently Tika doesn't expose that information but the Tika Parser API
was designed for such use in mind, so it will be possible to add the
offset information. Please file a Tika feature request [1] for this.

[1] https://issues.apache.org/jira/browse/TIKA

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to