Hi Kendall,

"Position" and "Offset" are often confused in Lucene ;)

Lucene uses offset to track what you referred to ("(character, not byte)
offset into a text file", or into an indexed string).

Lucene uses position to track the Nth token: position 0 is first token,
position 1 is the second token, etc.  But since tokens are usually N > 1
characters, the offsets grow faster than the positions.  These tokens need
not be only a linear sequence: they can be a graph structure when
multi-token synonyms are applied.

Lucene indexes both of these, and you can turn them individually on/off if
you want.

Finally, you might be interested in Lucene's highlighters module -- this
contains tooling to do hit highlighting, to solve the "final inch" problem
of showing your users precisely which words/excerpts matched inside each
matched hit.  Here's an example
<https://jirasearch.mikemccandless.com/search.py?chg=new&text=python&a1=&a2=&page=0&searcher=24390&sort=recentlyUpdated&format=list&id=jvmz29ec86du&dd=project%3ALucene&newText=python>
(searching Lucene's issues for the word "python").

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 22, 2022 at 12:51 AM Mikhail Khludnev <m...@apache.org> wrote:

> Hello, Kendall.
>
> You can read about Token Position Increments at
>
> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/analysis/package-summary.html#package.description
> Usually position is a number of word and offset is a number of symbol.
> Modeling entries via positions is boilerplate, I suppose. Nowadays we
> either denormalize by copying values across children into a single parent
> document. Also, here are more relational options
>
> https://lucene.apache.org/core/9_2_0/join/org/apache/lucene/search/join/package-summary.html
>
>
> On Fri, Jul 22, 2022 at 7:02 AM Kendall Shaw <ks...@kendallshaw.com>
> wrote:
>
> > Hi,
> >
> > I'm trying to figure out if I should be learning to use Lucene. I
> > imagine wanting to provide a user with a way to search for something and
> > present that found thing, in some way. If what is ultimately searched is
> > text files, then position would be an offset into the text file, I
> > think. But, that seems like a pretty unlikely scenario.
> >
> > If I have stored structured data into a database of some sort, does
> > Lucene provide some way to associate a position with an entry in a
> > database? Or is that left to the programmer to implement, outside of
> > Lucene?
> >
> > Kendall
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to