Re: [HippoCMS-dev] BrokenLinkChecker - Help Needed

Jasha Joachimsthal Fri, 03 Jul 2009 02:21:09 -0700

Let me rephrase- change in the indexer configuration: you can either delete
the whole Lucene index or touch the relevant part of the repository
- change in the extractor configuration: you need to touch the relevant part
of the repository


Jasha


2009/7/3 Marco Casavecchia Morganti <[email protected]>

> Thanks again!
>
> hmm, I saw that the reference and the links properties are already present
> into my dasl-indexer.
> So, i'm planning to change the extractor by modifying the xpath attribute
> into the existing links and reference extractors.
>
> In this case i only need a reindex by deleting the lucene index.. right?
>
>
> Jasha Joachimsthal wrote:
>
>> Almsot forgot, in the indexer:
>>    <property namespace="http://hippo.nl/cms/1.0"; name="links" type="text"
>> analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>>
>> There's a difference between reindexing existing content/properties and
>> adding a new property.
>> If you delete the Lucene index, the repository will force indexing
>> existing
>> content/properties. This may be handy if you change the analyzers or the
>> way
>> existing properties are indexed.
>> If you introduce new extractors you need to "touch" all content. We've
>> created a batch processor tool for that which is described on [1].
>>
>> [1] http://www.hippocms.org/display/CMS/Hippo+Touch
>>
>> Jasha Joachimsthal
>>
>> [email protected] - [email protected]
>>
>> www.onehippo.com
>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1
>> (707) 7734646
>>
>>
>>
>> 2009/7/3 Marco Casavecchia Morganti <[email protected]>
>>
>>  Thanks Jasha,
>>> I understood. So I don't need to change anything into the
>>> dasl-indexer.xml.
>>>
>>> I have another question: Is there a way to force a "reindex" of the
>>> repository in order to create such properties for my existing documents?
>>>
>>>
>>> Jasha Joachimsthal wrote:
>>>
>>>  Hello Marco,
>>>> an example is
>>>>
>>>> <extractor
>>>> classname="nl.hippo.slide.extractor.UrlListXMLPropertyExtractor"
>>>> uri="/files/default.preview" content-type="text/xml | text/xml;
>>>> charset=UTF-8 | application/xml">
>>>>  <configuration>
>>>>  <instruction property="links" namespace="http://hippo.nl/cms/1.0";
>>>>
>>>>
>>>> xpath="//@href|//@src|//datasource/text()|//bannerUrl/text()|//logoUrl/text()"/>
>>>>  </configuration>
>>>> </extractor>
>>>>
>>>> As you see the xpaths are concatenated from several parts in the XML.
>>>> I'm
>>>> not really sure if the xpath engine also supports
>>>> //@href[starts-with(.,'/content/')] to filter internal only links.
>>>> Hope this helps you,
>>>>
>>>> Jasha Joachimsthal
>>>>
>>>> [email protected] - [email protected]
>>>>
>>>> www.onehippo.com
>>>> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466
>>>> San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952
>>>> +1
>>>> (707) 7734646
>>>>
>>>>
>>>>
>>>> 2009/7/3 Marco Casavecchia Morganti <[email protected]
>>>> >
>>>>
>>>>  Hello all,
>>>>
>>>>> I would like to set up the broken link checker for my CMS, but before
>>>>> start, i need to know if i understood how does it works.
>>>>> As far as know, the checker creates an XML file into the repository
>>>>> that
>>>>>  is the "database" of the inspected links.
>>>>> To create this document it needs to browse the repository in search of
>>>>> a
>>>>> webdavProperty called "links".
>>>>>
>>>>> So, if this is right, i need to configure an extractor into the
>>>>> repository.
>>>>>
>>>>> Now, I have a document like this:
>>>>> ------------------------------------
>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>> <document>
>>>>>  <metaCurSection>multimedia</metaCurSection>
>>>>>  <taxonomies> </taxonomies>
>>>>>  <primaryData lang="it">
>>>>>  <content >
>>>>>    <html>
>>>>>      <body>
>>>>>        <a href="http://www.google.com"; title="prova">testlink</a>
>>>>>      </body>
>>>>>    </html>
>>>>>  </content>
>>>>>  <shortDescription />
>>>>>  <title>Test di Impaginazione Template</title>
>>>>>  </primaryData>
>>>>>  <attachments lang="it">
>>>>>  <externalLinks>
>>>>>    <externalLink label="Prova" order="1" url="http://www.google.com/
>>>>> "/>
>>>>>  </externalLinks>
>>>>>  <assets>
>>>>>    <asset order="1" path="/binaries/sandbox/urb_part.gif"/>
>>>>>  </assets>
>>>>>  <images>
>>>>>    <image alt="Prova Formattazione" order="1"
>>>>> path="/binaries/sandbox/nx03_wallpaper01.jpg"/>
>>>>>  </images>
>>>>>  <relatedDocs>
>>>>>    <relatedDoc order="1"
>>>>>
>>>>>
>>>>> path="/content/taxonomies/ankonline/uffici/stampa/conferenze/2007/nuovodoc.xml"/>
>>>>>  </relatedDocs>
>>>>>  </attachments>
>>>>>  <multimedia lang="it">
>>>>>  <stream externalPath="/video/test.flv" repository="external"/>
>>>>>  </multimedia>
>>>>>  <secondaryData lang="it">
>>>>>  <tickets />
>>>>>  <other />
>>>>>  </secondaryData>
>>>>>  <contacts lang="it">
>>>>>  <info />
>>>>>  <timeTable />
>>>>>  <telephones>
>>>>>    <telephone number="112324345" order="1"/>
>>>>>  </telephones>
>>>>>  <faxes>
>>>>>    <fax number="12121341" order="2"/>
>>>>>  </faxes>
>>>>>  <emails>
>>>>>    <email address="[email protected]" order="3"/>
>>>>>  </emails>
>>>>>  </contacts>
>>>>> </document>
>>>>>
>>>>> I have to extract:
>>>>> - The links on the html fields like "/document/PrimaryData/content"
>>>>> - The extrenal Links on
>>>>> "/document/Attachments/extrenalLinks/externalLink"
>>>>> - The internal Links on "/document/Attachments/relatedDocs/relatedDoc"
>>>>> - The images on "document/Attachments/images/image"
>>>>> - The assets on "document/Attachments/assets/asset"
>>>>>
>>>>> Can someone show me an example for an extractor configuration?
>>>>> Thanks in advance.
>>>>>
>>>>> --
>>>>> By MCM.
>>>>>
>>>>> << La teoria è quando si sa tutto ma non funziona niente.
>>>>> La pratica è quando tutto funziona ma non si sa il perché.
>>>>> In ogni caso si finisce con il coniugare la teoria con la pratica: non
>>>>> funziona niente e non si sa il perché. >>
>>>>> (A. Einstein)
>>>>> ********************************************
>>>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>>>
>>>>> Searchable archives can be found at:
>>>>> MarkMail: http://hippocms-dev.markmail.org
>>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>>>
>>>>>
>>>>>  ********************************************
>>>>>
>>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>>
>>>> Searchable archives can be found at:
>>>> MarkMail: http://hippocms-dev.markmail.org
>>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>>
>>>>
>>>>
>>>>  --
>>> By MCM.
>>>
>>> << La teoria è quando si sa tutto ma non funziona niente.
>>> La pratica è quando tutto funziona ma non si sa il perché.
>>> In ogni caso si finisce con il coniugare la teoria con la pratica: non
>>> funziona niente e non si sa il perché. >>
>>> (A. Einstein)
>>> ********************************************
>>> Hippocms-dev: Hippo CMS development public mailinglist
>>>
>>> Searchable archives can be found at:
>>> MarkMail: http://hippocms-dev.markmail.org
>>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>>
>>>
>>>  ********************************************
>> Hippocms-dev: Hippo CMS development public mailinglist
>>
>> Searchable archives can be found at:
>> MarkMail: http://hippocms-dev.markmail.org
>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>
>>
>>
>
> --
> By MCM.
>
> << La teoria è quando si sa tutto ma non funziona niente.
> La pratica è quando tutto funziona ma non si sa il perché.
> In ogni caso si finisce con il coniugare la teoria con la pratica: non
> funziona niente e non si sa il perché. >>
> (A. Einstein)
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] BrokenLinkChecker - Help Needed

Reply via email to