Bernhard Huber wrote: > > Hi, > > Using the avalon components might help to speed up the searching, as I > changed the classes to Recyclable, > and corrected a bug in the IndexReaderCache -giving me a > TooManyOpenedFiles exception. > As there will be a lot of clients doing search, it is important to have > a fast search, hence: > The indexReader is like a JdbcConnection, pooling would speed up. Only > in case of the changing the index it > is neccessary to recreate the indexReader.
Good point. > >Why don't you throw in your skeleton ideas here and we discuss then in > >the open? > > > Okay, perhaps i have misunderstood something. > > >>* I will implement some paging for the search result, if there are too > >>much search result for displaying on a single page. > >> > > > >Yep, this is a must do. > > > I have done this but still using the old package names. > > I added a LuceneCocoonPager (I know the names...) class, doing the hits > per page calculation, and wrapping the Hits class. You will find it in > the attachment plus the modified searchindex.xsp. > > If searchindex.xsp stays I'd like to have some xsp-stylesheet for doing > the reendering of the paging stuff. > Is there some easy way not having to declare the logicsheet in the > cocoon.xconf? not that I know of. > For the developing I'd like > to declare the logicsheet inside the xsp itself. don't think it's possible on the current system. > This paging stuff should go into the > > org.apache.cocoon.generator.SearchGenerator, too. > This way the generator is able to generate only the search result which will be >displayed. I agree. > >>* I will study the Main class for the internal crawling.. > >> > > > >Great > > > Okay, it got an overview using the environment.commandline.* classes. > Now i have a question about crawling&indexing: > > As it is now I have a xsp to trigger the crawling&indexing. It uses http > URLs to access the xml-content for indexing. > Now to speed up I see following possibilities: > > First still staying in an servlet-context environment: > * For Servlet 2.3 something like this might work: > RequestDispatch rd = servletContext.getRequestDispatch( > "/cocoon/documents/index.html?cocoon-view=content" ); > rd.include( new_request_wrapper, new_response_wrapper ); > new_response_wrapper should hold the xml-content. > > For Cocoon in Servlet 2.2, and higher: > I want to access the Cocoon instance of the current servlet-context. I > don't want to create another > Cocoon instance for sake of performance, and memory-consumption. > > If I have to create a new Cocoon instance, I see following choices: > > * create an Cocoon instance like the org.apache.cocoon.Main and try to > grap the right configs, etc like the servlet-engine Cocoon instance. How > could I assert to get the right configs? > * create an Cocoon instance simulating an servlet-environment. > Can you give some hints about implementing the easiest solution. Cocoon is an avalon component. My best choice would be to retrieve Cocoon as a component directly from the ComponentManager, then call the process(Environment) method indicating what environment we want, just like the Main class does. > For the commandline only crawling, and indexing I see following choices: > * Implement something like the org.apache.cocoon.Main for the crawling, > and indexing. Same here I will > grap the same config like the servlet-engine Cocoon instance. > * Additional adding an Ant wrapper: > <taskdef name="cocoon-index" > class="org.apache.cocoon.optional.ant.CocoonIndexTask"/> > <cocoon-index > index-directory="/a/c/index" > create="yes" > analyzer="org.apache.lucene.analyzer.StandardAnalyzer" > uri="index.html" > contextDir="${build.context}" > destDir="${build.dir}/ant-test/docs" > workDir="${build.dir}/ant-test/work" > logLevel="INFO"> > </cocoon-index> > * Now should there be some Cocoon Ant datatype for making it more easy > to create an Cocoon instance? like: > <cocoon-index > index-directory="/a/c/index" > create="yes" > analyzer="org.apache.lucene.analyzer.StandardAnalyzer" > uri="index.html"> > <cocoon > contextDir="${build.context}" > destDir="${build.dir}/ant-test/docs" > workDir="${build.dir}/ant-test/work" > logLevel="INFO"/> > </cocoon-index> hmmm, might connect Ant to Cocoon too strongly but I really don't know. What do others think about this? > * Apropos Ant wrapper I was implementing an Ant wrapper for the Main > class by extending the Ant class Java, and it works fine, calling the > Main.main() from a forked java. > Thus creating the cocoon documents: > ... > <taskdef name="cocoon" > classname="org.apache.cocoon.optional.ant.CocoonJavaTask"> > <classpath> > <path refid="classpath"/> > </classpath> > </taskdef> > > <cocoon > contextDir="${build.context}" > destDir="${build.dir}/ant-test/docs" > workDir="${build.dir}/ant-test/work" > logLevel="INFO" > uri="index.html" > > > <classpath> > <path refid="classpath"/> > </classpath> > </cocoon> > ... > But I failed to call it setting fork=false, getting some > ClassNotFoundException. Now I wonder the ServletEngine has solved this > somehow.... Sounds like a classloading containment problem. Ant is not as advanced on classloading like Tomcat is. > * Having a command line, or Ant wrapped indexing, and crawling the last > open issues is to invoke that via some time-service, some > ApplicationServer like WLS offers that, and I think there is some > Cron-Service in the Avalon-System. Does it makes sense to add the > Avalon-Cron service into a simple Servlet-Engine? I think so. > > >searching for 'cocoon' would result in something like: > > > > <search:results> > > <search:hit rank="1" score="89%" uri="..."> > > <xhtml:p> > > <search:highlight>Cocoon</search:highlight> now offers semantic > ><search:highlight>search</search:highlight> > > </xhtml:p> > > </search:hit> > > ... > > </search:results> > > > >As you can see, this also includes part of the "context" where the > >textual information is found. This follows the Google model and I think > >it would be a *great* feature to have. > > > This is possible if you change the lucene API a bit. > There was some posting in lucene mailing list regarding highlightning. I > don't know about the state of that approvement. Anyway the highlightning > needs some changes in the lucene API, i have modified "my" > lucene to be able to do highlightning. Hmmm, forking lucene is not exactly a good way of working with them. I'd suggest you to send the patches to them and see what comes up from there. I would be against having a ad-hoc modified version of Lucene into our CVS. > Moreover if you want to have something like highligthning, the question > is if the summary should be stored in the > index, too, or should we ask for the cocoon-view again, at search-time, > to get the summary? Right, I was thinking the same thing. Performance-wise, the obvious answer is to store the summary along with the index. > I have implemented the LuceneIndexContentHandler to generate no-store > fields, body, and all the element, and attribute fields are not stored > only indexed fields, > Now adding a summary might make it worth to add the body field as > stored. But what about the > <s1 title="Introdcution">? The "Introduction" is not stored in the body. > How should we summarize this? Attributes can appear only once, what about wrapping them with square brakets? [Introduction] This text is something that blah blah blah [How to blah blah] blah blah blah but I'm wide open to suggestions here. -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]