Here's an interesting problem: In my app, we are indexing various types of documents, including microsoft powerpoint. Powerpoint documents are mostly binary, but have a bunch of text (all of the text in the document?) as well.
My thinking is that the binary will never get searched for, and the proper text will be indexed and queried as expected, so the indexed binary will never affect results. Is this correct? Then my colleague mentioned that maybe the indexed garbage would affect the weighting of certain searches? I figure that weighting is only per-search so, same situation as above, only the proper terms will be calculated. What do you folks think? John _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

