What about using the HippoLastModifiedExtractor [1]? Haven't tested it but I 
guess it would fit your needs.

[1] 
http://hippocms.org/display/CMS/4.+Hippo+Repository+Configure+Extractors#4.HippoRepositoryConfigureExtractors-l.hippo.slide.extractor.HippoLastmodifiedExtr...

<extractor classname="nl.hippo.slide.extractor.HippoLastmodifiedExtractor" 
uri="/files/default.preview/binaries"
 content-type="application/pdf">
    <configuration>
      <instruction property="publicatieDatum" 
namespace="http://hippo.nl/cms/1.0"; outputFormat="yyyyMMddHHmm"/>
    </configuration>
  </extractor>

Jasha


-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED] namens Ard Schrijvers
Verzonden: wo 16-7-2008 18:12
Aan: Hippo CMS development public mailinglist
Onderwerp: RE: [HippoCMS-dev] PDF extractor
 
Hello,

Currently it is not supported, but if you know (take a look at) Apache
TIKA, you might see how to extract a date from a pdf. If it is not done
there, then I think you cannot extract a date from pdf, but I assume it
should be possible

ard

> Hi Jascha,
> Thanks for your reply.
> I need this date just to be sorted with the dasl query, 
> manually filling this date it's really no option.
> Is there a way when uploading the pdf to get the current date 
> and automatically set it to the pdf file as a specific property?
> 
> Thank you,
> 
> Wilson
> 
> 
> 2008/7/16 Jasha Joachimsthal <[EMAIL PROTECTED]>:
> 
> > The PDF extractor only indexes the text content of the PDF. 
> Some other 
> > extractors can also set a property on a document ehich is either a 
> > static value or a value based on some xpath in your xml document.
> > Since binaries like PDFs don't get published, you won't have a 
> > publicationDate. It is possible to set properties by hand 
> from the CMS 
> > like the caption. In the properties.xml you use for assets 
> you can add  
> > <Property>
> >    <Name>publicatieDatum</Name>
> >    <DisplayName>Publicatie datum</DisplayName>
> >    <Namespace>http://hippo.nl/cms/1.0</Namespace>
> >    <NamespacePrefix>cms</NamespacePrefix>
> >    <Datatype>date</Datatype>
> >  </Property>
> >
> > You'll get a date field when you click on the PDF. A sample 
> > properties.xml can be found in src/cocoon/types/collection
> >
> > Jasha
> >
> > -----Oorspronkelijk bericht-----
> > Van: [EMAIL PROTECTED] namens Wilson de Paula 
> > Pedro Junior
> > Verzonden: wo 16-7-2008 11:00
> > Aan: Hippo CMS development public mailinglist
> > Onderwerp: [HippoCMS-dev] PDF extractor
> >
> > Hi guys,
> >
> > I hope someone can help me with this one.
> > We have a dasl query which is used to search news articles, 
> pdf's and 
> > word documents.
> > The resultset must be sorted by date. News article and word has 
> > already a publicationDate property where I can sort.
> > But the pdf don't. Anybody knows how I can use the extractor to 
> > extract its creationdate and set as property publicationDate in the 
> > http://hippo.nl/cms/1.0 namespace?
> > The property of those 3 items must have the same name, in 
> order to the 
> > dasl works.
> >
> > I have tried:
> >
> >  <extractor classname="org.apache.slide.extractor.PDFExtractor"
> > uri="/files/default.preview/binaries" 
> content-type="application/pdf">  
> > <configuration>
> >    <instruction property="publicatiedatum" namespace="
> > http://hippo.nl/cms/1.0"; summary-information="4"/>  
> </configuration> 
> > </extractor>
> >
> > But I have no idea if I can use summary-information here.
> >
> >
> > And I have tried to use the ConstantExtractor to set a 
> property from 
> > the DAV
> > property:
> >  <extractor classname="nl.hippo.slide.extractor.ConstantExtractor"
> > uri="/files/project.preview/binaries" 
> content-type="application/pdf" >
> >   <configuration>
> >     <instruction property="publicatiedatum" namespace="
> > http://hippo.nl/cms/1.0"; value="DAV:name" />
> >   </configuration>
> >  </extractor>
> >
> > Thanks in advance.
> >
> > Wilson
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> >
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >
> >
> >
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> >
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >
> >
> >
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
> 
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> 
> 
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html


<<winmail.dat>>

********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to