Peter Becker wrote: > Leo Galambos wrote: > > > Peter Becker wrote: > > > >> Hi Tod, > >> > >> as far as I know Lucene itself doesn't offer this (at least we failed > >> to find it). The closest thing available seem to be the Ant tasks. > >> > >> We are currently working on introducing this notion for our program, > >> which is open source. Beside the plugin mechanism there will be a > >> file filter mapping and a thread mechanism to maintain an index as > >> well as implementations using POI and Multivalent. Give us another > >> week or two. > > > > > > Unfortunately, I didn't get this. Could you explain the mechanism, > > please? Thank you > > Not fully yet, since we are still working on it ;-) You can find the > code in our CVS repository on Sourceforge: > > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/source/org/tockit/docco/ > > The idea is that you have to supply different parsers for different > formats, then turn the results found into Lucene Document objects. At > the moment we do this using a normal interface similar to the one used > in the Java Ant tasks (see the "handlers" directory), but we want to > turn it into a plugin interface. Our tool should in the end do TXT, HTML > and XML out of the box and have at least three plugin implementations: > > - POI for .doc, .xls > - PDFbox for .pdf > - Multivalent for .pdf, .dvi and others > > The plugin API will be extremely simple and it should fit easily with > the Ant tasks, so you should be able to wrap our code into an Ant task > or whatever interface you need.
This sounds really cool. If I'm reading you correctly it will be a fairly intuitive exercise to port parsers writtent in Java for existing file formats to use your plugin architecture. Accurate? Tod --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
