What about using the HippoLastModifiedExtractor [1]? Haven't tested it but I guess it would fit your needs.
[1] http://hippocms.org/display/CMS/4.+Hippo+Repository+Configure+Extractors#4.HippoRepositoryConfigureExtractors-l.hippo.slide.extractor.HippoLastmodifiedExtr... <extractor classname="nl.hippo.slide.extractor.HippoLastmodifiedExtractor" uri="/files/default.preview/binaries" content-type="application/pdf"> <configuration> <instruction property="publicatieDatum" namespace="http://hippo.nl/cms/1.0" outputFormat="yyyyMMddHHmm"/> </configuration> </extractor> Jasha -----Oorspronkelijk bericht----- Van: [EMAIL PROTECTED] namens Ard Schrijvers Verzonden: wo 16-7-2008 18:12 Aan: Hippo CMS development public mailinglist Onderwerp: RE: [HippoCMS-dev] PDF extractor Hello, Currently it is not supported, but if you know (take a look at) Apache TIKA, you might see how to extract a date from a pdf. If it is not done there, then I think you cannot extract a date from pdf, but I assume it should be possible ard > Hi Jascha, > Thanks for your reply. > I need this date just to be sorted with the dasl query, > manually filling this date it's really no option. > Is there a way when uploading the pdf to get the current date > and automatically set it to the pdf file as a specific property? > > Thank you, > > Wilson > > > 2008/7/16 Jasha Joachimsthal <[EMAIL PROTECTED]>: > > > The PDF extractor only indexes the text content of the PDF. > Some other > > extractors can also set a property on a document ehich is either a > > static value or a value based on some xpath in your xml document. > > Since binaries like PDFs don't get published, you won't have a > > publicationDate. It is possible to set properties by hand > from the CMS > > like the caption. In the properties.xml you use for assets > you can add > > <Property> > > <Name>publicatieDatum</Name> > > <DisplayName>Publicatie datum</DisplayName> > > <Namespace>http://hippo.nl/cms/1.0</Namespace> > > <NamespacePrefix>cms</NamespacePrefix> > > <Datatype>date</Datatype> > > </Property> > > > > You'll get a date field when you click on the PDF. A sample > > properties.xml can be found in src/cocoon/types/collection > > > > Jasha > > > > -----Oorspronkelijk bericht----- > > Van: [EMAIL PROTECTED] namens Wilson de Paula > > Pedro Junior > > Verzonden: wo 16-7-2008 11:00 > > Aan: Hippo CMS development public mailinglist > > Onderwerp: [HippoCMS-dev] PDF extractor > > > > Hi guys, > > > > I hope someone can help me with this one. > > We have a dasl query which is used to search news articles, > pdf's and > > word documents. > > The resultset must be sorted by date. News article and word has > > already a publicationDate property where I can sort. > > But the pdf don't. Anybody knows how I can use the extractor to > > extract its creationdate and set as property publicationDate in the > > http://hippo.nl/cms/1.0 namespace? > > The property of those 3 items must have the same name, in > order to the > > dasl works. > > > > I have tried: > > > > <extractor classname="org.apache.slide.extractor.PDFExtractor" > > uri="/files/default.preview/binaries" > content-type="application/pdf"> > > <configuration> > > <instruction property="publicatiedatum" namespace=" > > http://hippo.nl/cms/1.0" summary-information="4"/> > </configuration> > > </extractor> > > > > But I have no idea if I can use summary-information here. > > > > > > And I have tried to use the ConstantExtractor to set a > property from > > the DAV > > property: > > <extractor classname="nl.hippo.slide.extractor.ConstantExtractor" > > uri="/files/project.preview/binaries" > content-type="application/pdf" > > > <configuration> > > <instruction property="publicatiedatum" namespace=" > > http://hippo.nl/cms/1.0" value="DAV:name" /> > > </configuration> > > </extractor> > > > > Thanks in advance. > > > > Wilson > > ******************************************** > > Hippocms-dev: Hippo CMS development public mailinglist > > > > Searchable archives can be found at: > > MarkMail: http://hippocms-dev.markmail.org > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > > > > > > > ******************************************** > > Hippocms-dev: Hippo CMS development public mailinglist > > > > Searchable archives can be found at: > > MarkMail: http://hippocms-dev.markmail.org > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > > > > > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
<<winmail.dat>>
******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
