Re: Multi-valued fields and TokenStream

Steve Rowe Thu, 06 Nov 2014 11:14:33 -0800

Maybe the position increment gap would be useful?  If set to a value larger 
than likely max position for any individual value, it could be used to infer 
(non-)first-value-ness.


> On Nov 5, 2014, at 1:03 PM, [email protected] wrote:
> 
> Several times now, I’ve had to come up with work-arounds for a TokenStream 
> not knowing it’s processing the first value or a subsequent-value of a 
> multi-valued field.  Two of these times, the use-case was ensuring the first 
> position of each value started at a multiple of 1000 (or some other 
> configurable value), and the third was encoding sentence paragraph counters 
> (similar to a do-it-yourself position increment).  
> 
> The work-arounds are awkward and hacky.  For example if you’re in control of 
> your Tokenizer, you can prefix subsequent values with a special flag, and 
> then do the right think in reset().  But then the highlighter or value 
> retrieval in general is impacted.  It’s also possible to create the fields 
> with the constructor that accepts a TokenStream that you’ve told it’s the 
> first or subsequent value but it’s awkward going that route, and sometimes 
> (e.g. Solr) it’s hard to know all the values you have up-front to even do 
> that.
> 
> It would be nice if TokenStream.reset() took a boolean ‘first’ argument.  
> Such a change would obviously be backwards incompatible.  Simply overloading 
> the method to call the no-arg version is problematic because TokenStreams are 
> a chain, and it would likely result in the chain getting doubly-reset.
> 
> Any ideas?
> 
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Multi-valued fields and TokenStream

Reply via email to