Hi Luís,

If the contents of the files dont change one solution is to store the text
parsed by tika in a compressed way, ~7% extracted text size.
In updating the document, just search the old one with the contents ready
(compressed) and update the other fields that you need.

Best,
Marcio

http://www.neoco.com.br


Em qui, 14 de fev de 2019 às 15:09, Luís Filipe Nassif <lfcnas...@gmail.com>
escreveu:

> Thank you, Erick.
>
> Unfortunately we need to index those fields.
>
> Currently we do not store text because of storage requirements and it is
> slow to extract it again.
>
> Thank you for the tips.
> Luis
>
> Em qua, 13 de fev de 2019 18:13, Erick Erickson <erickerick...@gmail.com
> escreveu:
>
> > If (and only if) the fields you need to update are single-valued,
> > docValues=true, indexed=false, you can do in-place update of the DV
> > field only.
> >
> > Otherwise, you'll probably have to split the docs up. The question is
> > whether you have evidence that reindexing is too expensive.
> >
> > If you do need to split the docs up, you might find some of the
> > streaming capabilities useful for join kinds of operations of other
> > join options don't work out or you just prefer the streaming
> > alternative.
> >
> > Best,
> > Erick
> >
> > On Wed, Feb 13, 2019 at 11:43 AM Luís Filipe Nassif <lfcnas...@gmail.com
> >
> > wrote:
> > >
> > > Hi all,
> > >
> > > Lucene 7 still deletes and re-adds docs when an update operation is
> done,
> > > as I understood.
> > >
> > > When docs have dozens of fields and one of them is large text content
> > > (extracted by Tika) and if I need to update some other small fields,
> what
> > > is the best approach to not reindex that large text field?
> > >
> > > Any better way than splitting the index in two (metadata and text
> > indexes)
> > > and using ParallelCompositeReader for searches?
> > >
> > > Thanks in advance,
> > > Luis
> >
> >
>

Reply via email to