Hi Erik, Thanks for the pointers, I have modified the Indexer.java to index the files from the directory by removing the file extenstion check of (".txt"). Now I do get the index from the files.
New situation is that when I run the FileSearch java org.apache.lucene.demo.SearchFiles Query: tty Searching for: tty 3 total matching documents 0. No path nor URL for this document 1. No path nor URL for this document 2. No path nor URL for this document I do not get the actual path from the index and using Luke I get the three hits. Last two are from the index and not the real documents. Any idea what is happeneing and how can I fix it. Thanks. -H Erik Hatcher wrote: > On Jan 10, 2005, at 7:06 PM, Hetan Shah wrote: > >>Got the latest Ant and got the demo to work. I am however not sure >>which part in the whole source code is the indexing for different file >>types is done, say for example .html .txt and such? > > > Your best bet is to dig around in the codebase. The Indexer.java code > is hard-coded to only do .txt file extensions - this was on purpose as > the first example in the book, figuring someone using this code on the > their C:\ drive would be relatively safe and fast to run. > > Their is also an example easily run from the Ant launcher to show how > various document types can be handled using an extensible framework. > Run "ant ExtensionFileHandler". It doesn't actually index the document > it creates, but displays it to the console. It would be pretty trivial > to pair the Indexer.java code up with the file handler framework to > crawl a directory tree and index any content it recognizes. > > >>Appreciate your help. If you have any sample code would certainly >>appreciate that also. > > > You got all the code already. It should be fairly straightforward to > navigate the src tree, especially with the Table of Contents handy: > > http://www.lucenebook.com/toc > > (incidentally, this dynamic TOC page is blending the blog content with > the TOC using an IndexReader to find all blog entries that refer to > each section - and you'll see the two, minor and cosmetic, errata > listed there already). > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]