Hi, there is no easy way to do this with Lucene. The analysis part is tightly bound to IndexWriter. There are ways to decouple this, but you have to write your own Analyzer and some network protocol.
Solr has something lik this, it's called PreAnalyzedField: This is a field type that has some special analyzer behind that does not analyze text in the conventional way, but instead treats the indexed content as JSON, with all the tokens with their attributes implemented as a JSON array. On the indexing node the IndexWriter just uses this JSON-Analyzer and creates tokens from it that are indexed. On the other side you have several machines that parse and analyze your documents, but instead of creating Lucene documents they just create JSON objects with all analyzed tokens from it (those analyzed tokens contain token text, position and offset information, NLP stuff, keyword markers - all attributes a normal tokenstream in Lucene would have). Those JSON objects are transferred over the network and IndexWriter parses them using the "special analyzer". But that's hard to implement. I'd go for Solr instead of doing that on your own! ๐ Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Denis Bazhenov [mailto:dot...@gmail.com] > Sent: Thursday, March 30, 2017 11:02 AM > To: java-user@lucene.apache.org > Subject: Re: Document serializable representation > > We already have done this. Many years ago :) > > At the moment we have 7 shards. The problem with getting more shards is > that search become less cost effective (in terms of cluster CPU time per > request) as you split index in more shards. Considering response time is good > enough and the fact search nodes are ~90% of all hardware budget of the > cluster, itโs much more cost effective to split analysis from IndexWriter than > split index in more shards. It simply would require from us to put > disproportionately more hardware in cluster. > > > On Mar 30, 2017, at 18:36, Uwe Schindler <u...@thetaphi.de> wrote: > > > > What you would better do is to just split your index into multiple shards > and have separate IndexWriter instances on different machines. Those can > act on their own. This is what Elasticsearch or Solr are doing: They accept > the > document, decide which shard they should be located and transfer the plain > fieldname:value pairs over the network. Each node then creates Lucene > IndexableDocuments out of it and passes to their own IndexWriter. > > --- > Denis Bazhenov <dot...@gmail.com> > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org