2009/12/16 Maurizio Pillitu <m.pill...@sourcesense.com>: > Got it to work! > > There were some restrictions in the DASL query that were excluding the PDF > result to come out. > > Thanks a lot!
You're welcome! > > mau > > On Wed, Dec 16, 2009 at 11:26 AM, Jasha Joachimsthal < > j.joachimst...@onehippo.com> wrote: > >> 2009/12/16 Maurizio Pillitu <m.pill...@sourcesense.com>: >> > Thanks guys, >> > I know I was missing some bits of the big picture :) >> > >> > So here's the next question: when I perform a DASL query, I normally >> > *select* some properties *from* some repository location (path) *where* a >> > certain property matches one or more conditions; if I don't have a >> property >> > to match, how can I define the *where* condition? >> > >> > sounds like a very stupid question .... sorry for that. >> >> There are no stupid questions! >> For fulltext search, you can do <d:contains>mySearchWord</d:contains> >> If you really need properties, you can let the user set them in the >> assets perspective. See [1] >> >> [1] >> http://wiki.onehippo.com/display/CMS/WebDAV+properties+used+by+Hippo+CMS >> >> > Thx again >> > >> > mau >> > >> > On Wed, Dec 16, 2009 at 11:01 AM, Jeroen Reijn <j.re...@onehippo.com> >> wrote: >> > >> >> Hi Maurizio, >> >> >> >> as far as I know the pdf extractor as you have you configured now >> extracts >> >> all content to the lucene index only and makes sure that the text can be >> >> found and mapped to the pdf document. I don't think Slide has a >> repository >> >> extractor that can extract the information and store it as a property. >> >> >> >> Regards, >> >> >> >> Jeroen >> >> >> >> Maurizio Pillitu wrote: >> >> >> >>> Hi everyone, >> >>> I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); >> I've >> >>> added to my (default) extractors.xml the following: >> >>> >> >>> .... >> >>> <extractor classname="org.apache.slide.extractor.PDFExtractor" >> >>> uri="/files/default.preview/binaries" content-type="application/pdf"/> >> >>> ..... >> >>> >> >>> then I dropped a Google Docs generated PDF file (attached) in >> >>> /files/default.preview/binaries (via WebDAV); I see the repository >> logging >> >>> some interesting bits (attached) as if the extraction process went >> fine, >> >>> but >> >>> I can't see the extracted data; I'd have expected a WebDAV property >> >>> attached >> >>> to the file, but nothing shows up; this is the list of properties >> related >> >>> with the PDF file (using DAVExplorer) >> >>> >> >>> getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT >> >>> displayname DAV: this_is_my_title.pdf >> >>> modificationdate DAV: 2009-12-16T09:38:35Z >> >>> UID DAV: 96da71317f000001004b0bbb796bcb32 >> >>> supportedlock DAV: >> >>> getcontenttype DAV: application/pdf >> >>> getcontentlength DAV: 5078 >> >>> resourcetype DAV: >> >>> getcontentlanguage DAV: en >> >>> getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b >> >>> lockdiscovery DAV: >> >>> source DAV: >> >>> creationdate DAV: 2009-12-16T09:38:35Z >> >>> >> >>> >> >>> I feel like I'm missing something on how the PDFExtractor works; I've >> >>> looked >> >>> for some documentation or specific configurations, but I couldn't find >> >>> anything interesting. >> >>> >> >>> Any hints? >> >>> TIA >> >>> mau >> >>> >> >>> Met vriendelijke groet, >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------ >> >>> >> >>> >> >>> ******************************************** >> >>> Hippocms-dev: Hippo CMS development public mailinglist >> >>> >> >>> Searchable archives can be found at: >> >>> MarkMail: http://hippocms-dev.markmail.org >> >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >>> >> >>> ******************************************** >> >> Hippocms-dev: Hippo CMS development public mailinglist >> >> >> >> Searchable archives can be found at: >> >> MarkMail: http://hippocms-dev.markmail.org >> >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> >> >> >> > >> > >> > -- >> > >> > Met vriendelijke groet, >> > -- >> > Maurizio Pillitu - 0031 (0)615655668 >> > Opensource Software Engineer >> > Scrum Certified Master - http://www.scrumalliance.org >> > Sourcesense - making sense of Open Source: http://www.sourcesense.com >> > ******************************************** >> > Hippocms-dev: Hippo CMS development public mailinglist >> > >> > Searchable archives can be found at: >> > MarkMail: http://hippocms-dev.markmail.org >> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> > >> > >> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> > > > -- > > Met vriendelijke groet, > -- > Maurizio Pillitu - 0031 (0)615655668 > Opensource Software Engineer > Scrum Certified Master - http://www.scrumalliance.org > Sourcesense - making sense of Open Source: http://www.sourcesense.com > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html