Samuel- I'm basically using the software in a similar fashion to how you are. However, something to remember, is that the documents that you're indexing need to be in a location that is published by your webserver. What I did, was use the tomcat connectors, and mount my document repository inside my tomcat webapps directory. That way, it will index the path by using the demo IndexHTML command from a child of webapps. Then, I created JkMounts for the children of the webapps directory.
I'm not a developer (to say the least), and it's probably a somewhat half-baked way around the problems, but, my instance works, and all indexed documents are available via the links displayed on the results page. Quoting Otis Gospodnetic <[EMAIL PROTECTED]>: > Samuel, > > Some basic understanding of what Lucene is what is missing here. > Lucene does not index web pages. > Lucene indexes text. > Lucene is not automatically aware of your wb site nor your domain. > Lucene is aware only of what you 'feed it' at index time. > If you index files, which IndexDemo does, Lucene index will have only > information about files (information such as file path). Lucene has no > clue that you really want to index your web site. > Even if you could replace C:\..... with http://.... it wouldn't be a > good solution, as directory structures and file paths do not always map > directly to URLs. > > In short, you have a bit more reading to do :) > The information is all there, it just has to be read :( > Good luck! > > Otis > > > > --- Samuel Alfonso Vel�zquez D�az <[EMAIL PROTECTED]> wrote: > > > > Yes I have > > 1.- The directory with the files to index: > > C:/filesToIndex/www/ > > > > 2.- A path where the index files from the search engine will be > > created, lets say > > C:/index/ > > 3.- I have an internet domain whose name is: www.mysite.com > > 4.- A web application context that runs at > > http://www.mysite.com/search > > > > Once I have set all the above things I want to be able to use the > > search aplication: > > http://www.mysite.com/search/search.jsp > > And I dont want that the results that I get from the index (step 2) > > give me results like > > Your file is at > > C:/filesToIndex/www/some_html/my_doc.html > > The results should be: > > Your file is at > > http://www.mysite.com/some_html/my_doc.html > > For the comments I have read (THANK YOU VERY MUTCH) I conclude that > > there is no way to generate the index with some custom prefix (as > > http://www.mysite.com/ for the documents at C:/filesToIndex/www/). > > It seems that I have to modify my web application > > (http://www.mysite.com/search/search.jsp) to include some logic to > > repalce "C:/filesToIndex/www/" to "http://www.mysite.com/". > > If you could point me to the source code of lucene to include this > > logic and this way fix it once and for all, will appreciate a lot. > > The command I used to generate this index was: > > java org.apache.lucene.demo.IndexHTML -create -index index C:\index > > C:\filesToIndex\ www\ > > Now in the web application I have to modify > > IndexSearcher searcher; > > Query query; > > Hits hits; > > > > // some code after... > > hits = searcher.search(query); > > > > for ( /* search through the hit list*/) > > > > Document doc = hits.doc(i); > > String doctitle = doc.get("title"); > > String url = doc.get("url"); > > > > I have to do some thing like url = "http://www.mysite.com/" + > > url.substring("C:/filesToIndex/www/".length); > > > > Regards!!! > > And thanks again > > Pinky Iyer <[EMAIL PROTECTED]> wrote: > > I dont understand the explanantion. When I try and index the > > documents as mentioned in the examples, and then when i run the app > > and do a sample search, it does point to the directory structure say > > "c:/filesToIndex/www/" instead of "http://localhost:8080/www/". So > > how can this be changed to reflect the website domain as mentioned by > > you. Could you explain again. Say my docs are under a directory > > c:/filesToIndex/www/ and the wesite is as you said > > http://localhost:8080/ , then how to proceed! > > Thanks in advance! > > Samuel Alfonso Vel�zquez D�az wrote: > > Oh ok, I thougth it was going to be some thing like the egothor > > search engine (A java based search engine). When you create the > > Index, you issue a command like: > > java org.egothor.indexer.mirror.DoTanker /tmp/my_www > > Project/Egothor/var/www as http://localhost:8080 > > /thmp/my_www: Is the path to the directory where the index is to be > > created > > Project/Egothor/var/www: is the path to the local file system files > > to be indexed. > > and as http://localhost:8080 is the prefix that the index will keep > > on the hit list. This way the index will be relative to > > http://localhost:8080. Even if your production site may be an other > > site. > > Thanks for your comments, any way now I know that I have to modify > > code to do this. > > Regards! > > Jeff Linwood wrote:Hi, > > > > I'm not a hundred percent sure I understand what you are asking, but > > when > > you get the results back from Lucene (the hits) it's up to you to > > format > > them to display on a web page - you can always do the modification > > there > > when you display the links to the results. > > > > Jeff > > ----- Original Message ----- > > From: "Samuel Alfonso Vel�zquez D�az" > > To: "Lucene Users List" > > Sent: Tuesday, March 04, 2003 11:33 AM > > Subject: Regarding Setup Lucine for my site > > > > > > > > > > The documentation says: > > > > > > Once you've gotten this far you're probably itching to go. Let's > > start by > > creating the index you'll need for the web examples. Since you've > > already > > set your classpath in the previous examples, all you need to do is > > type > > "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} > > ..". > > You'll need to do this from a (any) subdirectory of your > > {tomcat}/webapps > > directory (make sure you didn't leave off the ".." or you'll get a > > null > > pointer exception). {index-dir} should be a directory that Tomcat has > > permission to read and write, but is outside of a web accessible > > context. By > > default the webapp is configured to look in /opt/lucene/index for > > this > > index. > > > > > > A copy of my site is in: > > > > > > C:\CopiaSite20030228\ > > > > > > My web application runs on > > > > > > http://mydomain.com/search/index.jsp > > > > > > how can I make the lucene index map the URLs of the indexed files > > to: > > > > > > http://mydomain.com/ > > > > > > > > > > > > Please help! > > > > > > > > > Samuel Alfonso Vel�zquez D�az > > > http://www.geocities.com/samuelvd > > > [EMAIL PROTECTED] > > > > > > > > > --------------------------------- > > > Do you Yahoo!? > > > Yahoo! Tax Center - forms, calculators, tips, and more > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > Samuel Alfonso Vel�zquez D�az > > http://www.geocities.com/samuelvd > > [EMAIL PROTECTED] > > > > > > --------------------------------- > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, and more > > > > > > --------------------------------- > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, and more > > > > Samuel Alfonso Vel�zquez D�az > > http://www.geocities.com/samuelvd > > [EMAIL PROTECTED] > > > > > > --------------------------------- > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, and more > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > LanRx Network Solutions, Inc. Providing Enterprise Level Solutions...On A Small Business Budget --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
