Hi Luís, If the contents of the files dont change one solution is to store the text parsed by tika in a compressed way, ~7% extracted text size. In updating the document, just search the old one with the contents ready (compressed) and update the other fields that you need.
Best, Marcio http://www.neoco.com.br Em qui, 14 de fev de 2019 às 15:09, Luís Filipe Nassif <lfcnas...@gmail.com> escreveu: > Thank you, Erick. > > Unfortunately we need to index those fields. > > Currently we do not store text because of storage requirements and it is > slow to extract it again. > > Thank you for the tips. > Luis > > Em qua, 13 de fev de 2019 18:13, Erick Erickson <erickerick...@gmail.com > escreveu: > > > If (and only if) the fields you need to update are single-valued, > > docValues=true, indexed=false, you can do in-place update of the DV > > field only. > > > > Otherwise, you'll probably have to split the docs up. The question is > > whether you have evidence that reindexing is too expensive. > > > > If you do need to split the docs up, you might find some of the > > streaming capabilities useful for join kinds of operations of other > > join options don't work out or you just prefer the streaming > > alternative. > > > > Best, > > Erick > > > > On Wed, Feb 13, 2019 at 11:43 AM Luís Filipe Nassif <lfcnas...@gmail.com > > > > wrote: > > > > > > Hi all, > > > > > > Lucene 7 still deletes and re-adds docs when an update operation is > done, > > > as I understood. > > > > > > When docs have dozens of fields and one of them is large text content > > > (extracted by Tika) and if I need to update some other small fields, > what > > > is the best approach to not reindex that large text field? > > > > > > Any better way than splitting the index in two (metadata and text > > indexes) > > > and using ParallelCompositeReader for searches? > > > > > > Thanks in advance, > > > Luis > > > > >