2009/12/16 Maurizio Pillitu <[email protected]>: > Hi everyone, > I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); I've > added to my (default) extractors.xml the following: > > .... > <extractor classname="org.apache.slide.extractor.PDFExtractor" > uri="/files/default.preview/binaries" content-type="application/pdf"/> > ..... > > then I dropped a Google Docs generated PDF file (attached) in > /files/default.preview/binaries (via WebDAV); I see the repository logging > some interesting bits (attached) as if the extraction process went fine, but > I can't see the extracted data; I'd have expected a WebDAV property attached > to the file, but nothing shows up; this is the list of properties related > with the PDF file (using DAVExplorer) > > getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT > displayname DAV: this_is_my_title.pdf > modificationdate DAV: 2009-12-16T09:38:35Z > UID DAV: 96da71317f000001004b0bbb796bcb32 > supportedlock DAV: > getcontenttype DAV: application/pdf > getcontentlength DAV: 5078 > resourcetype DAV: > getcontentlanguage DAV: en > getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b > lockdiscovery DAV: > source DAV: > creationdate DAV: 2009-12-16T09:38:35Z > > > I feel like I'm missing something on how the PDFExtractor works; I've looked > for some documentation or specific configurations, but I couldn't find > anything interesting. > > Any hints? > TIA > mau > > Met vriendelijke groet, > -- > Maurizio Pillitu - 0031 (0)615655668
Hey Mau, the PDFExctrator doesn't set properties. It's just a full text indexer for PDF files. Jasha Joachimsthal [email protected] - [email protected] www.onehippo.com Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam +31(0)20-5224466 San Francisco - Hippo USA Inc. 185 H Street, suite B, Petaluma CA 94952 +1 (707) 7734646 ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
