The PDF extractor only indexes the text content of the PDF. Some other 
extractors can also set a property on a document ehich is either a static value 
or a value based on some xpath in your xml document. 
Since binaries like PDFs don't get published, you won't have a publicationDate. 
It is possible to set properties by hand from the CMS like the caption. In the 
properties.xml you use for assets you can add
  <Property>
    <Name>publicatieDatum</Name>
    <DisplayName>Publicatie datum</DisplayName>
    <Namespace>http://hippo.nl/cms/1.0</Namespace>
    <NamespacePrefix>cms</NamespacePrefix>
    <Datatype>date</Datatype>
  </Property>

You'll get a date field when you click on the PDF. A sample properties.xml can 
be found in src/cocoon/types/collection

Jasha

-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED] namens Wilson de Paula Pedro Junior
Verzonden: wo 16-7-2008 11:00
Aan: Hippo CMS development public mailinglist
Onderwerp: [HippoCMS-dev] PDF extractor
 
Hi guys,

I hope someone can help me with this one.
We have a dasl query which is used to search news articles, pdf's and word
documents.
The resultset must be sorted by date. News article and word has already a
publicationDate property where I can sort.
But the pdf don't. Anybody knows how I can use the extractor to extract its
creationdate and set as property publicationDate in the
http://hippo.nl/cms/1.0 namespace?
The property of those 3 items must have the same name, in order to the dasl
works.

I have tried:

 <extractor classname="org.apache.slide.extractor.PDFExtractor"
uri="/files/default.preview/binaries" content-type="application/pdf">
  <configuration>
    <instruction property="publicatiedatum" namespace="
http://hippo.nl/cms/1.0"; summary-information="4"/>
  </configuration>
</extractor>

But I have no idea if I can use summary-information here.


And I have tried to use the ConstantExtractor to set a property from the DAV
property:
 <extractor classname="nl.hippo.slide.extractor.ConstantExtractor"
uri="/files/project.preview/binaries" content-type="application/pdf" >
   <configuration>
     <instruction property="publicatiedatum" namespace="
http://hippo.nl/cms/1.0"; value="DAV:name" />
   </configuration>
 </extractor>

Thanks in advance.

Wilson
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html


<<winmail.dat>>

********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to