Maybe the position increment gap would be useful? If set to a value larger than likely max position for any individual value, it could be used to infer (non-)first-value-ness.
> On Nov 5, 2014, at 1:03 PM, [email protected] wrote: > > Several times now, I’ve had to come up with work-arounds for a TokenStream > not knowing it’s processing the first value or a subsequent-value of a > multi-valued field. Two of these times, the use-case was ensuring the first > position of each value started at a multiple of 1000 (or some other > configurable value), and the third was encoding sentence paragraph counters > (similar to a do-it-yourself position increment). > > The work-arounds are awkward and hacky. For example if you’re in control of > your Tokenizer, you can prefix subsequent values with a special flag, and > then do the right think in reset(). But then the highlighter or value > retrieval in general is impacted. It’s also possible to create the fields > with the constructor that accepts a TokenStream that you’ve told it’s the > first or subsequent value but it’s awkward going that route, and sometimes > (e.g. Solr) it’s hard to know all the values you have up-front to even do > that. > > It would be nice if TokenStream.reset() took a boolean ‘first’ argument. > Such a change would obviously be backwards incompatible. Simply overloading > the method to call the no-arg version is problematic because TokenStreams are > a chain, and it would likely result in the chain getting doubly-reset. > > Any ideas? > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
