i m overviewing TopDocs.merge. What is the difference to use multiple SearchIndexer and then to use TopDocs or to use MultiReader?
2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto <cristian.lorenze...@gmail.com >: > For my opinion this study dont tell any thing more than before. Obviously > if you try to retrieve all data store in a single query the performance > will be not good. Lucene is fantastic But no magic. The physic laws > continue to work also with lucene. The query is designed for retrieving a > small part of a big store, not All The store. In addition i think The time > would be worst also if you dont sort documents. Using a sorted linked list > persisted i dont see relevant delays . Syncerely i dont understand also gc > memory limit with lucene algorithm. The size of memory used is not > proporzional to the datastore size, else lucene will be not scalable. The > problem to analize for me is another : considering The trend of big data to > encrease in The last years , considering The classical max size of a > database among those we know, considering The possibility or not to scale > up sharding in lucene in arrays defined dinamically or not , we can > evaluate if this refactoring has sense or not. > > Inviato da iPad > > > Il giorno 19 ago 2016, alle ore 05:50, Erick Erickson < > erickerick...@gmail.com> ha scritto: > > > > OK, I'm a little out of my league here, but I'll plow on anyway.... > > > > bq: There are use cases out there where >2^31 does make sense in a > single index > > > > Ok, let's put some definition to this and define the use-case > > specifically rather than > > be vague. I've just run an experiment for instance where I had 200M > > docs in a single > > shard (very small docs) and tried to sort by a date on all of them. > > Performance on the order of > > 5 seconds. 3B is what, 75 seconds? Does the use-case involve sorting? > > Faceting? If > > so the performance will probably be poor. > > > > This would be huge surgery I believe, and there hasn't been a > > compelling use-case > > in the search world for it. Unless and until that case is made I > > suspect this idea will > > meet with a lot of resistance. > > > > That said, I do understand that this is somewhat akin to "Nobody will > > ever need more > > than 64K of ram", meaning that some limits are assumed and eventually > become > > outmoded. But given Java's issues with memory and GC I suspect that > > it'll be really > > hard to justify the work this would take. > > > > FWIW, > > Erick > > > > > >> On Thu, Aug 18, 2016 at 6:31 PM, Trejkaz <trej...@trypticon.org> wrote: > >>> On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand <jpou...@gmail.com> > wrote: > >>> No, IndexWriter enforces that the number of documents cannot go over > >>> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and > >>> BaseCompositeReader computes the number of documents in a long > variable and > >>> ensures it is less than 2^31, so you cannot have indexes that contain > more > >>> than 2^31 documents. > >>> > >>> Larger collections should be written to multiple shards and use > >>> TopDocs.merge to merge results. > >> > >> But hang on: > >> * TopDocs#merge still returns a TopDocs. > >> * TopDocs still uses an array of ScoreDoc. > >> * ScoreDoc still uses an int doc ID. > >> > >> Looks like you're still screwed. > >> > >> I wish IndexReader would use long IDs too, because one IndexReader can > >> be across multiple shards too - it doesn't make much sense to me that > >> this is restricted, although "it's hard to fix in a > >> backwards-compatible way" is certainly a good reason. :D > >> > >> TX > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >