hello Wouter this looks excellent to me. We (at vara) are very interested to use it. I'm looking forward to more news about it.
Ernst > -----Oorspronkelijk bericht----- > Van: Wouter Heijke [mailto:[EMAIL PROTECTED] > Verzonden: donderdag 10 juni 2004 16:42 > Aan: '[EMAIL PROTECTED]' > Onderwerp: MMBase Lucene module > > > > Hi All, > > After yesterday's presentation at the MMEvent I'd like to > present the Lucene > full-text search module for MMBase to all of you. > What is it? > This module is a real MMBase module, so you have to install > 'lucenemodule.xml' in de modules directory to run it. > What is does is make the content of your cloud searchable. > This is done by indexing your cloud, and only those builders that you > specify in a config file, also the fields of these builders > that need to be > searched through have to be configured: > > <?xml version="1.0" encoding="UTF-8"?> > <lucenemodule> > <index name="MyNewsIndex"> > <table name="news"> > <field name="title" /> > <field name="subtitle" /> > <field name="intro">introduction</field> > <field name="body" /> > <related name="attachments"> > <field name="title">rel.title</field> > <field name="handle" type="binary">rel.body</field> > </related> > </table> > <table name="mags"> > <field name="title" /> > <field name="body" /> > </table> > </index> > </lucenemodule> > > The example from my slides abuses the MyNews example to show > how you could > configure the module. > Now when i search the 'MyNewsIndex' for a string in 'title' > I'm searching > through both news and mags, or if you specifically want this > in mags or > news only. > So ideally all your searchable content should have the same > kind of fields, > or if this isn't the case you can rename them to get a > uniform naming. In > the example I renamed the 'intro' field to be called > introduction in the > search index. > Each 'table' mentioned in the config file will result in a > 'document' to be > created by Lucene in it's index, each of these will have the > corresponding > MMBase node number and (builder) name indexed automatically. > When you search > the results will be a list of node numbers. > > Relations can be indexed also, like attachments in the > example, this can be > any kind of builder. If you specify type is 'binary' on the > field then this > field will be treated like a binary file and all text will be > extracted from > it and indexed. Now PDF and Word are supported. Related > content will be > indexed in the Lucene document of the parent of the relation, > so you won't > get the node number of the related MMBase object in your results. > > Lucene creates it's own database on the file system, this > database will be > rebuild each time the module runs, which is configurable in the > lucenemodule.xml file. This database or 'index' is named to the name > specified in the configuration file in the name attribute of > index. This > index is only used for searching by Lucene, results of a > search will only be > the node numbers. > > Right now the module is not available for download yet, it > needs some work > (the usual, cleaningup, documentation etc), but since my > presentation came > quite unexpected and there seemed to be some demand yesterday > I'm trying to > see how big the demand is to make this available. > > Wouter >
