Are you suggesting that DefaultIndexingChain.PerField.invert(boolean firstValue) would, prior to calling reset(), call setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is false? Hmmmm. I guess that would work, although it seems a bit hacky and it’s tying this to a specific attribute when ideally we notify the chain as a whole what’s going on. But it doesn’t require any new API, save for some javadocs. And it’s extremely unlikely there would be a backwards-incompatible problem, so that’s good. And I find this use is related to positions so it’s not so bad to abuse the position increment for this. Nice idea Steve; this works for me.
Does anyone else have an opinion before I create an issue? ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe <[email protected]> wrote: > Maybe the position increment gap would be useful? If set to a value > larger than likely max position for any individual value, it could be used > to infer (non-)first-value-ness. > > > On Nov 5, 2014, at 1:03 PM, [email protected] wrote: > > > > Several times now, I’ve had to come up with work-arounds for a > TokenStream not knowing it’s processing the first value or a > subsequent-value of a multi-valued field. Two of these times, the use-case > was ensuring the first position of each value started at a multiple of 1000 > (or some other configurable value), and the third was encoding sentence > paragraph counters (similar to a do-it-yourself position increment). > > > > The work-arounds are awkward and hacky. For example if you’re in > control of your Tokenizer, you can prefix subsequent values with a special > flag, and then do the right think in reset(). But then the highlighter or > value retrieval in general is impacted. It’s also possible to create the > fields with the constructor that accepts a TokenStream that you’ve told > it’s the first or subsequent value but it’s awkward going that route, and > sometimes (e.g. Solr) it’s hard to know all the values you have up-front to > even do that. > > > > It would be nice if TokenStream.reset() took a boolean ‘first’ > argument. Such a change would obviously be backwards incompatible. Simply > overloading the method to call the no-arg version is problematic because > TokenStreams are a chain, and it would likely result in the chain getting > doubly-reset. > > > > Any ideas? > > > > ~ David Smiley > > Freelance Apache Lucene/Solr Search Consultant/Developer > > http://www.linkedin.com/in/davidwsmiley > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
