Re: effectiveness of compression

Li Li Wed, 15 Feb 2012 01:55:01 -0800

for now lucene don't provide any thing like this.
maybe you can diff each version before add them into index . so it just
indexes and stores difference for newer version.


On Wed, Feb 15, 2012 at 4:25 PM, Jamie <ja...@stimulussoft.com> wrote:

> Greetings All.
>
> I'd like to index data corresponding to different versions of the same
> file. These files consists of PDF documents, word documents, and the like.
> So as to ensure that no information is lost, I'd like to create a new
> Lucene document for every version (or change) in a file. Each version of a
> file will have text added and removed, however, there is likely to be a
> high degree data duplication across the different versions. Assuming this
> indexed data is largely tokenized, to what extent will Lucene compress the
> data? Will it take into account that the data already exists in the index?
> I am worried about our index size growing too large when pursuing this
> strategy (i.e. one of creating a new Lucene document for every version of a
> file).
>
> Many thanks for your consideration.
>
> Jamie
>
>
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org>
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>
>
>

Re: effectiveness of compression

Reply via email to