Re: [HippoCMS-dev] Help with PDFExtractor

Jeroen Reijn Wed, 16 Dec 2009 02:02:12 -0800

Hi Maurizio,

as far as I know the pdf extractor as you have you configured nowextracts all content to the lucene index only and makes sure that thetext can be found and mapped to the pdf document. I don't think Slidehas a repository extractor that can extract the information and store itas a property.


Regards,

Jeroen

Maurizio Pillitu wrote:

Hi everyone,
I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); I've
added to my (default) extractors.xml the following:

....
<extractor classname="org.apache.slide.extractor.PDFExtractor"
uri="/files/default.preview/binaries" content-type="application/pdf"/>
.....

then I dropped a Google Docs generated PDF file (attached) in
/files/default.preview/binaries (via WebDAV); I see the repository logging
some interesting bits (attached) as if the extraction process went fine, but
I can't see the extracted data; I'd have expected a WebDAV property attached
to the file, but nothing shows up; this is the list of properties related
with the PDF file (using DAVExplorer)

getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT
displayname DAV: this_is_my_title.pdf
modificationdate DAV: 2009-12-16T09:38:35Z
UID DAV: 96da71317f000001004b0bbb796bcb32
supportedlock DAV:
getcontenttype DAV: application/pdf
getcontentlength DAV: 5078
resourcetype DAV:
getcontentlanguage DAV: en
getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b
lockdiscovery DAV:
source DAV:
creationdate DAV: 2009-12-16T09:38:35Z


I feel like I'm missing something on how the PDFExtractor works; I've looked
for some documentation or specific configurations, but I couldn't find
anything interesting.

Any hints?
TIA
  mau

Met vriendelijke groet,


------------------------------------------------------------------------

********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] Help with PDFExtractor

Reply via email to