Ard,
Thanks for your reply
> > We encounter duplicate hits in a dasl search result. The
>
> Isn't this the live and the preview? Are you sure there are 2 hits, both with
> the exact same repository location, ie, the same 'href' in the result. If you
> are just editting documents through the cms, this is hardly possible AFAIK.
We only use the preview location since the documents do not need to be
published. They are put there by another services through a webdav mount.
> > documents do get re-inserted multiple times. The behaviour
> > points towards the Lucene index since there really is just
> > one document.
>
> If the 'href' is exactly the same (case sensitive also), it shouldn't be
> possible.
As far as I could see in Cadaver, they pointed to the same location.
> > I reckon this is not the default strategy of the indexer.
> > Could it be that we overwritten it by defining our own
> > dasl-indexer file?
>
> No, don't think so, but you might enlighten me a little on what you have done
> so far....
We set up our own dasl-indexer.xml file, see snippet:
<indexer class="nl.hippo.slide.index.LuceneIndexerDASLImpl">
<indexpath>../work/slide_index/default</indexpath>
<analyzer class="nl.hippo.slide.index.analysis.SimpleStandardAnalyzer"/>
<cron>0/5 * * * * ? *</cron>
<default-property-analyzer
class="nl.hippo.slide.index.analysis.SimpleStandardAnalyzer"/>
<case-sensitive>false</case-sensitive>
<index-all/>
<properties>
<property namespace="DAV:" name="getcontenttype" type="string"
support-defined="true"/>
<property namespace="DAV:" name="getlastmodified" type="date"/>
<property namespace="DAV:" name="creationdate" type="date"/>
<property namespace="DAV:" name="getcontentlength" type="int"/>
<!-- properties from extractors.xml -->
<property name="some_text" namespace="http://hippo.nl/cms/1.0" type="text"/>
<property name="a_string" namespace="http://hippo.nl/cms/1.0"
type="string"/>
<property name="a_date" namespace="http://hippo.nl/cms/1.0" type="date"/>
....
</properties>
<!-- don't edit the stuff below unless you know what you are doing -->
<resource-types>
<resource-type name="collection" namespace="DAV:"/>
<resource-type name="principal" namespace="DAV:"/>
<resource-type name="version-history" namespace="DAV:"/>
</resource-types>
<!-- use buffered indexing -->
<buffered-docs>10</buffered-docs>
<merge-factor>10</merge-factor>
<optimize-docs>300</optimize-docs>
</indexer>
And for the extrators.xml file see this snippet:
<?xml version="1.0"?>
<extractors>
<extractor classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
uri="/files/default.preview/content" content-type="text/xml">
<configuration>
<instruction property="some_text" namespace="http://hippo.nl/cms/1.0"
xpath="/root/body"/>
<instruction property="a_string" namespace="http://hippo.nl/cms/1.0"
xpath="/root/title"/>
</extractor>
<extractor classname="nl.hippo.slide.extractor.HippoXMLDatePropertyExtractor"
uri="/files/default.preview/content" content-type="text/xml">
<configuration>
<!-- fase1 -->
<instruction property="a_date" namespace="http://hippo.nl/cms/1.0"
xpath="/path/to/element/attr/text()"
inputFormat="dd-MM-yy' 'HH.mm" outputFormat="yyyy-MM-dd'T'HH:mm:ss"/>
.....
.....
</configuration>
</extractor>
<!-- XML content extractor -->
<extractor classname="nl.hippo.slide.extractor.XMLContentExtractor"
uri="/files" content-type="text/xml"/>
<extractor
classname="nl.hippo.slide.extractor.HippoMultiValueXMLPropertyExtractor"
uri="/files" content-type="text/xml | text/xml; charset=UTF-8 |
application/xml">
<configuration>
<instruction property="references" namespace="http://hippo.nl/cms/1.0"
xpath="//@href|//@src"/>
</configuration>
</extractor>
<extractor
classname="nl.hippo.slide.extractor.HippoUrlListXMLPropertyExtractor"
uri="/files" content-type="text/xml | text/xml; charset=UTF-8 |
application/xml">
<configuration>
<instruction property="links" namespace="http://hippo.nl/cms/1.0"
xpath="//@href|//@src"/>
</configuration>
</extractor>
</extractors>
> > Also I was wondering if it is possible to rebuild the entire
> > index periodically.
>
> Stopping the repository, delete the index and restart triggers a rebuild. If
> you get serious numbers of documents, I would no advice to do this. Also, it
> is not a solution to your issue. Anyway, think more information might help us
> (like, is it reproduceable every time, or does it happen every now and then,
> what did you do, etc etc). The issue you are having is not default behavior
> obviously, and we do not experience it normally.
The repository only contains a handful of documents. I removed everything under
work/slide_index/default but now my searches do not yield any results. Not even
for the DAV: namespace properties even though the documents are still in the
repository and a the properties are present.
I do see two new files in the index directoy (listener.ser and segments) but
not any cfs files.
Æde
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html