Tom wrote:

Hi,

Does anybody know how to index chm-files? A possible solution I know is to convert chm-files to pdf-files (there are
converters available for this job) and then use the known tools (e.g.
PDFBox) to index the content of the pdf files (which contain the content of
the chm-files). Are there any tools which can directly grab the textual
content out of the (binary) chm-files?


I think chm-file indexing-support is really a big missing piece in the
currently supported indexable filetype-collection (XML, HTML, PDF,
MSWord-DOC, RTF, Plaintext).


I believe its just a Microsoft .cab file with an index.html inside it... am I right?

just uncompress it.

The problem is that the HTML within them isn't any way NEAR standard and you can't really give them to the user in the UI...

Kevin

--

Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod!
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to