No, you can't do this with any existing analyzers I know of. Part of the problem here is that there's no good generic way to KNOW what a page and line are.
Have you investigated payloads? I'm not sure that's a good fit for this particular problem, but it might be worth investigating. Best Erick On Tue, Aug 3, 2010 at 10:58 AM, arun r <arun....@gmail.com> wrote: > hi all, > I am new to Lucene. I am trying to use Lucene to generate > data for a document classifier. I need to generate wordno, lineno, > pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery > to get the wordno (span.start()) for the term/phrase. To get pageno > and lineno, a custom Analyzer needs to be written ? Can the Analyzer > be made to recognize and newline and page feed characters and keep > track of lineno and pageno for the tokens ? > > Is it possible with existing Lucene Analyzer ? > > Thanks, > Arun > > -- > Where there is a will, there is a way ! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >