Thanks again!
hmm, I saw that the reference and the links properties are already
present into my dasl-indexer.
So, i'm planning to change the extractor by modifying the xpath
attribute into the existing links and reference extractors.
In this case i only need a reindex by deleting the lucene index.. right?
Jasha Joachimsthal wrote:
Almsot forgot, in the indexer:
<property namespace="http://hippo.nl/cms/1.0" name="links" type="text"
analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
There's a difference between reindexing existing content/properties and
adding a new property.
If you delete the Lucene index, the repository will force indexing existing
content/properties. This may be handy if you change the analyzers or the way
existing properties are indexed.
If you introduce new extractors you need to "touch" all content. We've
created a batch processor tool for that which is described on [1].
[1] http://www.hippocms.org/display/CMS/Hippo+Touch
Jasha Joachimsthal
[email protected] - [email protected]
www.onehippo.com
Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1
(707) 7734646
2009/7/3 Marco Casavecchia Morganti <[email protected]>
Thanks Jasha,
I understood. So I don't need to change anything into the dasl-indexer.xml.
I have another question: Is there a way to force a "reindex" of the
repository in order to create such properties for my existing documents?
Jasha Joachimsthal wrote:
Hello Marco,
an example is
<extractor
classname="nl.hippo.slide.extractor.UrlListXMLPropertyExtractor"
uri="/files/default.preview" content-type="text/xml | text/xml;
charset=UTF-8 | application/xml">
<configuration>
<instruction property="links" namespace="http://hippo.nl/cms/1.0"
xpath="//@href|//@src|//datasource/text()|//bannerUrl/text()|//logoUrl/text()"/>
</configuration>
</extractor>
As you see the xpaths are concatenated from several parts in the XML. I'm
not really sure if the xpath engine also supports
//@href[starts-with(.,'/content/')] to filter internal only links.
Hope this helps you,
Jasha Joachimsthal
[email protected] - [email protected]
www.onehippo.com
Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1
(707) 7734646
2009/7/3 Marco Casavecchia Morganti <[email protected]>
Hello all,
I would like to set up the broken link checker for my CMS, but before
start, i need to know if i understood how does it works.
As far as know, the checker creates an XML file into the repository that
is the "database" of the inspected links.
To create this document it needs to browse the repository in search of a
webdavProperty called "links".
So, if this is right, i need to configure an extractor into the
repository.
Now, I have a document like this:
------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<document>
<metaCurSection>multimedia</metaCurSection>
<taxonomies> </taxonomies>
<primaryData lang="it">
<content >
<html>
<body>
<a href="http://www.google.com" title="prova">testlink</a>
</body>
</html>
</content>
<shortDescription />
<title>Test di Impaginazione Template</title>
</primaryData>
<attachments lang="it">
<externalLinks>
<externalLink label="Prova" order="1" url="http://www.google.com/"/>
</externalLinks>
<assets>
<asset order="1" path="/binaries/sandbox/urb_part.gif"/>
</assets>
<images>
<image alt="Prova Formattazione" order="1"
path="/binaries/sandbox/nx03_wallpaper01.jpg"/>
</images>
<relatedDocs>
<relatedDoc order="1"
path="/content/taxonomies/ankonline/uffici/stampa/conferenze/2007/nuovodoc.xml"/>
</relatedDocs>
</attachments>
<multimedia lang="it">
<stream externalPath="/video/test.flv" repository="external"/>
</multimedia>
<secondaryData lang="it">
<tickets />
<other />
</secondaryData>
<contacts lang="it">
<info />
<timeTable />
<telephones>
<telephone number="112324345" order="1"/>
</telephones>
<faxes>
<fax number="12121341" order="2"/>
</faxes>
<emails>
<email address="[email protected]" order="3"/>
</emails>
</contacts>
</document>
I have to extract:
- The links on the html fields like "/document/PrimaryData/content"
- The extrenal Links on
"/document/Attachments/extrenalLinks/externalLink"
- The internal Links on "/document/Attachments/relatedDocs/relatedDoc"
- The images on "document/Attachments/images/image"
- The assets on "document/Attachments/assets/asset"
Can someone show me an example for an extractor configuration?
Thanks in advance.
--
By MCM.
<< La teoria è quando si sa tutto ma non funziona niente.
La pratica è quando tutto funziona ma non si sa il perché.
In ogni caso si finisce con il coniugare la teoria con la pratica: non
funziona niente e non si sa il perché. >>
(A. Einstein)
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
By MCM.
<< La teoria è quando si sa tutto ma non funziona niente.
La pratica è quando tutto funziona ma non si sa il perché.
In ogni caso si finisce con il coniugare la teoria con la pratica: non
funziona niente e non si sa il perché. >>
(A. Einstein)
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
--
By MCM.
<< La teoria è quando si sa tutto ma non funziona niente.
La pratica è quando tutto funziona ma non si sa il perché.
In ogni caso si finisce con il coniugare la teoria con la pratica: non
funziona niente e non si sa il perché. >>
(A. Einstein)
********************************************
Hippocms-dev: Hippo CMS development public mailinglist
Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html