Are you suggesting that DefaultIndexingChain.PerField.invert(boolean
firstValue) would, prior to calling reset(), call
setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is
false?  Hmmmm.  I guess that would work, although it seems a bit hacky and
it’s tying this to a specific attribute when ideally we notify the chain as
a whole what’s going on.  But it doesn’t require any new API, save for some
javadocs.  And it’s extremely unlikely there would be a
backwards-incompatible problem, so that’s good.  And I find this use is
related to positions so it’s not so bad to abuse the position increment for
this.  Nice idea Steve; this works for me.

Does anyone else have an opinion before I create an issue?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe <[email protected]> wrote:

> Maybe the position increment gap would be useful?  If set to a value
> larger than likely max position for any individual value, it could be used
> to infer (non-)first-value-ness.
>
> > On Nov 5, 2014, at 1:03 PM, [email protected] wrote:
> >
> > Several times now, I’ve had to come up with work-arounds for a
> TokenStream not knowing it’s processing the first value or a
> subsequent-value of a multi-valued field.  Two of these times, the use-case
> was ensuring the first position of each value started at a multiple of 1000
> (or some other configurable value), and the third was encoding sentence
> paragraph counters (similar to a do-it-yourself position increment).
> >
> > The work-arounds are awkward and hacky.  For example if you’re in
> control of your Tokenizer, you can prefix subsequent values with a special
> flag, and then do the right think in reset().  But then the highlighter or
> value retrieval in general is impacted.  It’s also possible to create the
> fields with the constructor that accepts a TokenStream that you’ve told
> it’s the first or subsequent value but it’s awkward going that route, and
> sometimes (e.g. Solr) it’s hard to know all the values you have up-front to
> even do that.
> >
> > It would be nice if TokenStream.reset() took a boolean ‘first’
> argument.  Such a change would obviously be backwards incompatible.  Simply
> overloading the method to call the no-arg version is problematic because
> TokenStreams are a chain, and it would likely result in the chain getting
> doubly-reset.
> >
> > Any ideas?
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to