Re: Index Ratio

Chris Collins Wed, 24 Jun 2009 19:48:29 -0700

You mention documents of various file types. It really depends onwhat those types are. For example the amount of text found in apowerpoint file is slim pickins. Ratios with office type apps tend tobe pretty fluffy. I have seen considerably better than 20-30% whenextracting text from such formats, some down to the ratio your talkingof.


C
On Jun 24, 2009, at 5:47 PM, pof wrote:

Hi, I just completed a batch test index of ~1100 documents ofvarious filetypes and I noticed that the original documents take up about 145MBbut my
index is only 1.7MB?? I remember reading somewhere that the typical
compression rate is about 20-30% or something, but mine is a littleover 1%!I'm not complaining or anything It just struck me a odd especiallyas I havea lot of archive files and emails with attachments that I parse aswell. Has
anyone else experienced something like this, I'm just curious.

Cheers. Brett.
--
View this message in context: 
http://www.nabble.com/Index-Ratio-tp24195272p24195272.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Index Ratio

Reply via email to