Re: [CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

2009-08-31 Thread Les Mikesell
Rajagopal Swaminathan wrote: Greetings, On Fri, Aug 28, 2009 at 10:50 PM, Les Mikeselllesmikes...@gmail.com wrote: Does anyone have experience with linux tools to parse the text from common non-text file formats for searching? I'm trying to use the kinosearch add-on for twiki which is fine

Re: [CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

2009-08-31 Thread Rajagopal Swaminathan
Greetings. On Mon, Aug 31, 2009 at 10:38 PM, Les Mikeselllesmikes...@gmail.com wrote: Wouldn't that have to be run under windows? Indeed. That was where that particular requirement was. One app wanted fulltext search on a bunch of .doc, etc. files But I demonstrated the POC using Centos

Re: [CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

2009-08-29 Thread Dave
On Fri, Aug 28, 2009 at 7:20 AM, Les Mikeselllesmikes...@gmail.com wrote: Does anyone have experience with linux tools to parse the text from common non-text file formats for searching?

Re: [CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

2009-08-29 Thread Rajagopal Swaminathan
Greetings, On Fri, Aug 28, 2009 at 10:50 PM, Les Mikeselllesmikes...@gmail.com wrote: Does anyone have experience with linux tools to parse the text from common non-text file formats for searching?  I'm trying to use the kinosearch add-on for twiki which is fine as far as the search goes, but

[CentOS] OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

2009-08-28 Thread Les Mikesell
Does anyone have experience with linux tools to parse the text from common non-text file formats for searching? I'm trying to use the kinosearch add-on for twiki which is fine as far as the search goes, but it takes forever to generate the index. It uses xpdf to extract strings from pdf's,