[Ferret-talk] indexing mostly-binary documents (.ppt)

John Bachir Sat, 31 Mar 2007 17:18:17 -0800

Here's an interesting problem: In my app, we are indexing various  
types of documents, including microsoft powerpoint. Powerpoint  
documents are mostly binary, but have a bunch of text (all of the  
text in the document?) as well.


My thinking is that the binary will never get searched for, and the  
proper text will be indexed and queried as expected, so the indexed  
binary will never affect results. Is this correct?

Then my colleague mentioned that maybe the indexed garbage would  
affect the weighting of certain searches? I figure that weighting is  
only per-search so, same situation as above, only the proper terms  
will be calculated.

What do you folks think?

John

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

[Ferret-talk] indexing mostly-binary documents (.ppt)

Reply via email to