Hello Marco, an example is <extractor classname="nl.hippo.slide.extractor.UrlListXMLPropertyExtractor" uri="/files/default.preview" content-type="text/xml | text/xml; charset=UTF-8 | application/xml"> <configuration> <instruction property="links" namespace="http://hippo.nl/cms/1.0" xpath="//@href|//@src|//datasource/text()|//bannerUrl/text()|//logoUrl/text()"/> </configuration> </extractor>
As you see the xpaths are concatenated from several parts in the XML. I'm not really sure if the xpath engine also supports //@href[starts-with(.,'/content/')] to filter internal only links. Hope this helps you, Jasha Joachimsthal [email protected] - [email protected] www.onehippo.com Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466 San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1 (707) 7734646 2009/7/3 Marco Casavecchia Morganti <[email protected]> > Hello all, > > I would like to set up the broken link checker for my CMS, but before > start, i need to know if i understood how does it works. > As far as know, the checker creates an XML file into the repository that > is the "database" of the inspected links. > To create this document it needs to browse the repository in search of a > webdavProperty called "links". > > So, if this is right, i need to configure an extractor into the repository. > > Now, I have a document like this: > ------------------------------------ > <?xml version="1.0" encoding="UTF-8"?> > <document> > <metaCurSection>multimedia</metaCurSection> > <taxonomies> </taxonomies> > <primaryData lang="it"> > <content > > <html> > <body> > <a href="http://www.google.com" title="prova">testlink</a> > </body> > </html> > </content> > <shortDescription /> > <title>Test di Impaginazione Template</title> > </primaryData> > <attachments lang="it"> > <externalLinks> > <externalLink label="Prova" order="1" url="http://www.google.com/"/> > </externalLinks> > <assets> > <asset order="1" path="/binaries/sandbox/urb_part.gif"/> > </assets> > <images> > <image alt="Prova Formattazione" order="1" > path="/binaries/sandbox/nx03_wallpaper01.jpg"/> > </images> > <relatedDocs> > <relatedDoc order="1" > path="/content/taxonomies/ankonline/uffici/stampa/conferenze/2007/nuovodoc.xml"/> > </relatedDocs> > </attachments> > <multimedia lang="it"> > <stream externalPath="/video/test.flv" repository="external"/> > </multimedia> > <secondaryData lang="it"> > <tickets /> > <other /> > </secondaryData> > <contacts lang="it"> > <info /> > <timeTable /> > <telephones> > <telephone number="112324345" order="1"/> > </telephones> > <faxes> > <fax number="12121341" order="2"/> > </faxes> > <emails> > <email address="[email protected]" order="3"/> > </emails> > </contacts> > </document> > > I have to extract: > - The links on the html fields like "/document/PrimaryData/content" > - The extrenal Links on "/document/Attachments/extrenalLinks/externalLink" > - The internal Links on "/document/Attachments/relatedDocs/relatedDoc" > - The images on "document/Attachments/images/image" > - The assets on "document/Attachments/assets/asset" > > Can someone show me an example for an extractor configuration? > Thanks in advance. > > -- > By MCM. > > << La teoria è quando si sa tutto ma non funziona niente. > La pratica è quando tutto funziona ma non si sa il perché. > In ogni caso si finisce con il coniugare la teoria con la pratica: non > funziona niente e non si sa il perché. >> > (A. Einstein) > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
