> From: icewind [mailto:[EMAIL PROTECTED]]
> 
>       I have created an index of some XML documents but I'm
> not thrilled with the way the index is built. Text
> appears to get indexed with the innermost XML tag it
> is found in. For example, if I had a fragment like the
> following:
> 
> <title>
>    <person>Alice's</person> guide to the great novels
>           of the <date>1800's</date>
> </title>
> 
> and I then used the following search term:
> "title:Alice" or "title:1800", I would not get a
> match. I would need to search for "person:Alice" or
> "date:1800" respectively.
>       Since all the tags within the <title> tag contain
> text that are clearly part of the title, I want a user
> who is searching through the collection to be able to
> do title specific searches that match any word within
> the title tag, regardless of whether it has other XML
> tags wrapped around it.
>       Has anyone run into this issue? I'm not sure how to
> go about implementing what I want. Is this something I
> could do in Cocoon, or would I have to modify
> something in the LuceneXMLIndexer component?

Look into LuceneIndexContentHandler, characters() method.

Ok, I see that it appends text only to bodyText and current tag...
Simple solution would be to add text to every field in stack (in
characters(), for(;;) instead of if()), but better solution is to have
not stack of StringBuffers (see this.elementStack), but stack of indexes
in single string buffer (this.bodyText). This solution will utilize
memory more efficiently.


Vadim
 

>       Suggestions appreciated. I imagine someone has run
> into this and has already come up with a workable
> solution.


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>

Reply via email to