Thank for fast reply and clarification,
I also would like to ask about indexing without storing.
We have document like this:
- title
- newspaper
- <url to pdf in /binaries>
And now we need ability to search documents that have some text in pdf. I
want to to this by writing my extractor that will take pdf, extract text and
put it in property. As you said before this is the way it should be done.
But in this case its completely unnesesary to keep pdf's text.
Is there a way to avoid duplication?

Darek


2007/12/20, Ard Schrijvers <[EMAIL PROTECTED]>:
>
>
> Hello Darek,
>
> > Hello,
> > I was looking for these information in docs, lists and found
> > nothing. If I repeated a problem - then sorry :)
> >
> > We have a problem with searching over documents. Lets say we
> > have a document that consists of : title, date, abstract.
> > We need ability to search over these fields separately.
> > We did that by making extractors that rewrite these fields to
> > properties p_title, p_date, p_abstract. Now lucene can index
> > it and it works.
> > But ...
> > Now we have same content in 2 places.
> > Is there a better way to do this?
>
> In principle, this is the way to do it. For a title and a date, it is
> pretty normal and straightforward. For the abstract you might not want
> to duplicate the entire text. For the abstract you might also work with
> ConfigurableXMLContentExtractor [1]. Then in your search/dasl, you could
> say something like:
>
> <d:contains locale="abstract"> your query </>
>
> As 'locale' already indicates, it is actually implemented for different
> languages within one xml file, so you would misuse it a little.
>
> OTOH, you might just keep working with your current approach without
> real problems. Make sure, that for the abstract, you configure the
> property in dasl-indexer.xml to be of type="text" (and use
> property-contains in your dasl instead op propcontains, see [2]). For
> date and title you might want to choose to not do this
>
> -Ard
>
> [1]
> http://www.hippocms.org/display/CMS/Hippo+Repository+ConfigurableXMLCont
> entExtractor
> [2] http://www.hippocms.org/display/CMS/06.+Using+DASL+Queries
>
> >
> > Second question.
> > Is it possible to index (for searching) something without
> > storing its content? Just like in lucene:
> > Field.Index = true
> > Field.Store = false
> >
> > Regards,
> > Darek
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> >
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Reply via email to