hello Wouter

this looks excellent to me. We (at vara) are very interested to use it. I'm
looking forward to more news about it.

Ernst

> -----Oorspronkelijk bericht-----
> Van: Wouter Heijke [mailto:[EMAIL PROTECTED]
> Verzonden: donderdag 10 juni 2004 16:42
> Aan: '[EMAIL PROTECTED]'
> Onderwerp: MMBase Lucene module
> 
> 
> 
> Hi All,
> 
> After yesterday's presentation at the MMEvent I'd like to 
> present the Lucene
> full-text search module for MMBase to all of you.
> What is it?
> This module is a real MMBase module, so you have to install
> 'lucenemodule.xml' in de modules directory to run it.
> What is does is make the content of your cloud searchable.
> This is done by indexing your cloud, and only those builders that you
> specify in a config file, also the fields of these builders 
> that need to be
> searched through have to be configured:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <lucenemodule>
>    <index name="MyNewsIndex">
>       <table name="news">
>          <field name="title" />
>          <field name="subtitle" />
>          <field name="intro">introduction</field>
>          <field name="body" />
>          <related name="attachments">
>             <field name="title">rel.title</field>
>             <field name="handle" type="binary">rel.body</field>
>          </related>
>       </table>
>       <table name="mags">
>          <field name="title" />
>          <field name="body" />
>       </table>
>    </index>
> </lucenemodule>
> 
> The example from my slides abuses the MyNews example to show 
> how you could
> configure the module.
> Now when i search the 'MyNewsIndex' for a string in 'title' 
> I'm searching
> through both news and mags, or if  you specifically want this 
> in mags or
> news only.
> So ideally all your searchable content should have the same 
> kind of fields,
> or if this isn't the case you can rename them to get a 
> uniform naming. In
> the example I renamed the 'intro' field to be called 
> introduction in the
> search index.
> Each 'table' mentioned in the config file will result in a 
> 'document' to be
> created by Lucene in it's index, each of these will have the 
> corresponding
> MMBase node number and (builder) name indexed automatically. 
> When you search
> the results will be a list of node numbers.
> 
> Relations can be indexed also, like attachments in the 
> example, this can be
> any kind of builder. If you specify type is 'binary' on the 
> field then this
> field will be treated like a binary file and all text will be 
> extracted from
> it and indexed. Now PDF and Word are supported. Related 
> content will be
> indexed in the Lucene document of the parent of the relation, 
> so you won't
> get the node number of the related MMBase object in your results.
> 
> Lucene creates it's own database on the file system, this 
> database will be
> rebuild each time the module runs, which is configurable in the
> lucenemodule.xml file. This database or 'index' is named to the name
> specified in the configuration file in the name attribute of 
> index. This
> index is only used for searching by Lucene, results of a 
> search will only be
> the node numbers.
> 
> Right now the module is not available for download yet, it 
> needs some work
> (the usual, cleaningup, documentation etc), but since my 
> presentation came
> quite unexpected and there seemed to be some demand yesterday 
> I'm trying to
> see how big the demand is to make this available.
> 
> Wouter
> 

Reply via email to