Let me rephrase- change in the indexer configuration: you can either delete the whole Lucene index or touch the relevant part of the repository - change in the extractor configuration: you need to touch the relevant part of the repository
Jasha 2009/7/3 Marco Casavecchia Morganti <[email protected]> > Thanks again! > > hmm, I saw that the reference and the links properties are already present > into my dasl-indexer. > So, i'm planning to change the extractor by modifying the xpath attribute > into the existing links and reference extractors. > > In this case i only need a reindex by deleting the lucene index.. right? > > > Jasha Joachimsthal wrote: > >> Almsot forgot, in the indexer: >> <property namespace="http://hippo.nl/cms/1.0" name="links" type="text" >> analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"/> >> >> There's a difference between reindexing existing content/properties and >> adding a new property. >> If you delete the Lucene index, the repository will force indexing >> existing >> content/properties. This may be handy if you change the analyzers or the >> way >> existing properties are indexed. >> If you introduce new extractors you need to "touch" all content. We've >> created a batch processor tool for that which is described on [1]. >> >> [1] http://www.hippocms.org/display/CMS/Hippo+Touch >> >> Jasha Joachimsthal >> >> [email protected] - [email protected] >> >> www.onehippo.com >> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466 >> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1 >> (707) 7734646 >> >> >> >> 2009/7/3 Marco Casavecchia Morganti <[email protected]> >> >> Thanks Jasha, >>> I understood. So I don't need to change anything into the >>> dasl-indexer.xml. >>> >>> I have another question: Is there a way to force a "reindex" of the >>> repository in order to create such properties for my existing documents? >>> >>> >>> Jasha Joachimsthal wrote: >>> >>> Hello Marco, >>>> an example is >>>> >>>> <extractor >>>> classname="nl.hippo.slide.extractor.UrlListXMLPropertyExtractor" >>>> uri="/files/default.preview" content-type="text/xml | text/xml; >>>> charset=UTF-8 | application/xml"> >>>> <configuration> >>>> <instruction property="links" namespace="http://hippo.nl/cms/1.0" >>>> >>>> >>>> xpath="//@href|//@src|//datasource/text()|//bannerUrl/text()|//logoUrl/text()"/> >>>> </configuration> >>>> </extractor> >>>> >>>> As you see the xpaths are concatenated from several parts in the XML. >>>> I'm >>>> not really sure if the xpath engine also supports >>>> //@href[starts-with(.,'/content/')] to filter internal only links. >>>> Hope this helps you, >>>> >>>> Jasha Joachimsthal >>>> >>>> [email protected] - [email protected] >>>> >>>> www.onehippo.com >>>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466 >>>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 >>>> +1 >>>> (707) 7734646 >>>> >>>> >>>> >>>> 2009/7/3 Marco Casavecchia Morganti <[email protected] >>>> > >>>> >>>> Hello all, >>>> >>>>> I would like to set up the broken link checker for my CMS, but before >>>>> start, i need to know if i understood how does it works. >>>>> As far as know, the checker creates an XML file into the repository >>>>> that >>>>> is the "database" of the inspected links. >>>>> To create this document it needs to browse the repository in search of >>>>> a >>>>> webdavProperty called "links". >>>>> >>>>> So, if this is right, i need to configure an extractor into the >>>>> repository. >>>>> >>>>> Now, I have a document like this: >>>>> ------------------------------------ >>>>> <?xml version="1.0" encoding="UTF-8"?> >>>>> <document> >>>>> <metaCurSection>multimedia</metaCurSection> >>>>> <taxonomies> </taxonomies> >>>>> <primaryData lang="it"> >>>>> <content > >>>>> <html> >>>>> <body> >>>>> <a href="http://www.google.com" title="prova">testlink</a> >>>>> </body> >>>>> </html> >>>>> </content> >>>>> <shortDescription /> >>>>> <title>Test di Impaginazione Template</title> >>>>> </primaryData> >>>>> <attachments lang="it"> >>>>> <externalLinks> >>>>> <externalLink label="Prova" order="1" url="http://www.google.com/ >>>>> "/> >>>>> </externalLinks> >>>>> <assets> >>>>> <asset order="1" path="/binaries/sandbox/urb_part.gif"/> >>>>> </assets> >>>>> <images> >>>>> <image alt="Prova Formattazione" order="1" >>>>> path="/binaries/sandbox/nx03_wallpaper01.jpg"/> >>>>> </images> >>>>> <relatedDocs> >>>>> <relatedDoc order="1" >>>>> >>>>> >>>>> path="/content/taxonomies/ankonline/uffici/stampa/conferenze/2007/nuovodoc.xml"/> >>>>> </relatedDocs> >>>>> </attachments> >>>>> <multimedia lang="it"> >>>>> <stream externalPath="/video/test.flv" repository="external"/> >>>>> </multimedia> >>>>> <secondaryData lang="it"> >>>>> <tickets /> >>>>> <other /> >>>>> </secondaryData> >>>>> <contacts lang="it"> >>>>> <info /> >>>>> <timeTable /> >>>>> <telephones> >>>>> <telephone number="112324345" order="1"/> >>>>> </telephones> >>>>> <faxes> >>>>> <fax number="12121341" order="2"/> >>>>> </faxes> >>>>> <emails> >>>>> <email address="[email protected]" order="3"/> >>>>> </emails> >>>>> </contacts> >>>>> </document> >>>>> >>>>> I have to extract: >>>>> - The links on the html fields like "/document/PrimaryData/content" >>>>> - The extrenal Links on >>>>> "/document/Attachments/extrenalLinks/externalLink" >>>>> - The internal Links on "/document/Attachments/relatedDocs/relatedDoc" >>>>> - The images on "document/Attachments/images/image" >>>>> - The assets on "document/Attachments/assets/asset" >>>>> >>>>> Can someone show me an example for an extractor configuration? >>>>> Thanks in advance. >>>>> >>>>> -- >>>>> By MCM. >>>>> >>>>> << La teoria è quando si sa tutto ma non funziona niente. >>>>> La pratica è quando tutto funziona ma non si sa il perché. >>>>> In ogni caso si finisce con il coniugare la teoria con la pratica: non >>>>> funziona niente e non si sa il perché. >> >>>>> (A. Einstein) >>>>> ******************************************** >>>>> Hippocms-dev: Hippo CMS development public mailinglist >>>>> >>>>> Searchable archives can be found at: >>>>> MarkMail: http://hippocms-dev.markmail.org >>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>>>> >>>>> >>>>> ******************************************** >>>>> >>>> Hippocms-dev: Hippo CMS development public mailinglist >>>> >>>> Searchable archives can be found at: >>>> MarkMail: http://hippocms-dev.markmail.org >>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>>> >>>> >>>> >>>> -- >>> By MCM. >>> >>> << La teoria è quando si sa tutto ma non funziona niente. >>> La pratica è quando tutto funziona ma non si sa il perché. >>> In ogni caso si finisce con il coniugare la teoria con la pratica: non >>> funziona niente e non si sa il perché. >> >>> (A. Einstein) >>> ******************************************** >>> Hippocms-dev: Hippo CMS development public mailinglist >>> >>> Searchable archives can be found at: >>> MarkMail: http://hippocms-dev.markmail.org >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>> >>> >>> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> >> > > -- > By MCM. > > << La teoria è quando si sa tutto ma non funziona niente. > La pratica è quando tutto funziona ma non si sa il perché. > In ogni caso si finisce con il coniugare la teoria con la pratica: non > funziona niente e non si sa il perché. >> > (A. Einstein) > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
