Do the concatenation yourself with your own TokenStream. You can index a field with a tokenstream for expert cases (the individual stored values can be added separately)
No need to make the tokenstream API more complicated: its already very complicated. On Thu, Nov 6, 2014 at 3:13 PM, [email protected] <[email protected]> wrote: > Are you suggesting that DefaultIndexingChain.PerField.invert(boolean > firstValue) would, prior to calling reset(), call > setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is > false? Hmmmm. I guess that would work, although it seems a bit hacky and > it’s tying this to a specific attribute when ideally we notify the chain as > a whole what’s going on. But it doesn’t require any new API, save for some > javadocs. And it’s extremely unlikely there would be a > backwards-incompatible problem, so that’s good. And I find this use is > related to positions so it’s not so bad to abuse the position increment for > this. Nice idea Steve; this works for me. > > Does anyone else have an opinion before I create an issue? > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley > > On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe <[email protected]> wrote: >> >> Maybe the position increment gap would be useful? If set to a value >> larger than likely max position for any individual value, it could be used >> to infer (non-)first-value-ness. >> >> > On Nov 5, 2014, at 1:03 PM, [email protected] wrote: >> > >> > Several times now, I’ve had to come up with work-arounds for a >> > TokenStream not knowing it’s processing the first value or a >> > subsequent-value of a multi-valued field. Two of these times, the use-case >> > was ensuring the first position of each value started at a multiple of 1000 >> > (or some other configurable value), and the third was encoding sentence >> > paragraph counters (similar to a do-it-yourself position increment). >> > >> > The work-arounds are awkward and hacky. For example if you’re in >> > control of your Tokenizer, you can prefix subsequent values with a special >> > flag, and then do the right think in reset(). But then the highlighter or >> > value retrieval in general is impacted. It’s also possible to create the >> > fields with the constructor that accepts a TokenStream that you’ve told >> > it’s >> > the first or subsequent value but it’s awkward going that route, and >> > sometimes (e.g. Solr) it’s hard to know all the values you have up-front to >> > even do that. >> > >> > It would be nice if TokenStream.reset() took a boolean ‘first’ argument. >> > Such a change would obviously be backwards incompatible. Simply >> > overloading >> > the method to call the no-arg version is problematic because TokenStreams >> > are a chain, and it would likely result in the chain getting doubly-reset. >> > >> > Any ideas? >> > >> > ~ David Smiley >> > Freelance Apache Lucene/Solr Search Consultant/Developer >> > http://www.linkedin.com/in/davidwsmiley >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
