Almsot forgot, in the indexer:
<property namespace="http://hippo.nl/cms/1.0" name="links" type="text"
analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
There's a difference between reindexing existing content/properties and
adding a new property.
If you delete the Lucene index, the repository will force indexing existing
content/properties. This may be handy if you change the analyzers or the way
existing properties are indexed.
If you introduce new extractors you need to "touch" all content. We've
created a batch processor tool for that which is described on [1].
[1] http://www.hippocms.org/display/CMS/Hippo+Touch
Jasha Joachimsthal
[email protected] - [email protected]
www.onehippo.com
Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1
(707) 7734646
2009/7/3 Marco Casavecchia Morganti <[email protected]>
> Thanks Jasha,
> I understood. So I don't need to change anything into the dasl-indexer.xml.
>
> I have another question: Is there a way to force a "reindex" of the
> repository in order to create such properties for my existing documents?
>
>
> Jasha Joachimsthal wrote:
>
>> Hello Marco,
>> an example is
>>
>> <extractor
>> classname="nl.hippo.slide.extractor.UrlListXMLPropertyExtractor"
>> uri="/files/default.preview" content-type="text/xml | text/xml;
>> charset=UTF-8 | application/xml">
>> <configuration>
>> <instruction property="links" namespace="http://hippo.nl/cms/1.0"
>>
>> xpath="//@href|//@src|//datasource/text()|//bannerUrl/text()|//logoUrl/text()"/>
>> </configuration>
>> </extractor>
>>
>> As you see the xpaths are concatenated from several parts in the XML. I'm
>> not really sure if the xpath engine also supports
>> //@href[starts-with(.,'/content/')] to filter internal only links.
>> Hope this helps you,
>>
>> Jasha Joachimsthal
>>
>> [email protected] - [email protected]
>>
>> www.onehippo.com
>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1
>> (707) 7734646
>>
>>
>>
>> 2009/7/3 Marco Casavecchia Morganti <[email protected]>
>>
>> Hello all,
>>>
>>> I would like to set up the broken link checker for my CMS, but before
>>> start, i need to know if i understood how does it works.
>>> As far as know, the checker creates an XML file into the repository that
>>> is the "database" of the inspected links.
>>> To create this document it needs to browse the repository in search of a
>>> webdavProperty called "links".
>>>
>>> So, if this is right, i need to configure an extractor into the
>>> repository.
>>>
>>> Now, I have a document like this:
>>> ------------------------------------
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <document>
>>> <metaCurSection>multimedia</metaCurSection>
>>> <taxonomies> </taxonomies>
>>> <primaryData lang="it">
>>> <content >
>>> <html>
>>> <body>
>>> <a href="http://www.google.com" title="prova">testlink</a>
>>> </body>
>>> </html>
>>> </content>
>>> <shortDescription />
>>> <title>Test di Impaginazione Template</title>
>>> </primaryData>
>>> <attachments lang="it">
>>> <externalLinks>
>>> <externalLink label="Prova" order="1" url="http://www.google.com/"/>
>>> </externalLinks>
>>> <assets>
>>> <asset order="1" path="/binaries/sandbox/urb_part.gif"/>
>>> </assets>
>>> <images>
>>> <image alt="Prova Formattazione" order="1"
>>> path="/binaries/sandbox/nx03_wallpaper01.jpg"/>
>>> </images>
>>> <relatedDocs>
>>> <relatedDoc order="1"
>>>
>>> path="/content/taxonomies/ankonline/uffici/stampa/conferenze/2007/nuovodoc.xml"/>
>>> </relatedDocs>
>>> </attachments>
>>> <multimedia lang="it">
>>> <stream externalPath="/video/test.flv" repository="external"/>
>>> </multimedia>
>>> <secondaryData lang="it">
>>> <tickets />
>>> <other />
>>> </secondaryData>
>>> <contacts lang="it">
>>> <info />
>>> <timeTable />
>>> <telephones>
>>> <telephone number="112324345" order="1"/>
>>> </telephones>
>>> <faxes>
>>> <fax number="12121341" order="2"/>
>>> </faxes>
>>> <emails>
>>> <email address="[email protected]" order="3"/>
>>> </emails>
>>> </contacts>
>>> </document>
>>>
>>> I have to extract:
>>> - The links on the html fields like "/document/PrimaryData/content"
>>> - The extrenal Links on
>>> "/document/Attachments/extrenalLinks/externalLink"
>>> - The internal Links on "/document/Attachments/relatedDocs/relatedDoc"
>>> - The images on "document/Attachments/images/image"
>>> - The assets on "document/Attachments/assets/asset"
>>>
>>> Can someone show me an example for an extractor configuration?
>>> Thanks in advance.
>>>
>>> --
>>> By MCM.
>>>
>>> << La teoria è quando si sa tutto ma non funziona niente.
>>> La pratica è quando tutto funziona ma non si sa il perché.
>>> In ogni caso si finisce con il coniugare la teoria con la pratica: non
>>> funziona niente e non si sa il perché. >>
>>> (A. Einstein)
>>> ********************************************
>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>
>>> Searchable archives can be found at:
>>> MarkMail: http://hippocms-dev.markmail.org
>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>
>>>
>>> ********************************************
>> Hippocms-dev: Hippo CMS development public mailinglist
>>
>> Searchable archives can be found at:
>> MarkMail: http://hippocms-dev.markmail.org
>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>
>>
>>
>
> --
> By MCM.
>
> << La teoria è quando si sa tutto ma non funziona niente.
> La pratica è quando tutto funziona ma non si sa il perché.
> In ogni caso si finisce con il coniugare la teoria con la pratica: non
> funziona niente e non si sa il perché. >>
> (A. Einstein)
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html