Hi jascha,Ard

Ja ja  I precisely have the following extractor added :

<extractor classname="org.apache.slide.extractor.PDFExtractor"
uri="/files/default.preview/binaries" content-type="application/pdf" />


But may be I don't exactly understand what do you mean with removing the
lucene index. Where can I do that in the configuration (in which file
can I do that??)

Abdeslam
-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Ard Schrijvers
Verzonden: Tuesday, July 15, 2008 12:58 PM
Aan: Hippo CMS development public mailinglist
Onderwerp: RE: [HippoCMS-dev] search voor some text in pdf or word
document


> 
> Hi Abdeslam,
> 
> Did you replace /files/yourproject.preview/binaries with the 
> correct pathname for your project?
> 
> Ard: is removing the lucene index enough? Shouldn't the files 
> be touched?

Removing is enough. Touching is only need when extractors run that set a
property

Ard

> 
> Jasha Joachimsthal 
> 
> www.onehippo.com
> Amsterdam - Hippo B.V. Oosteinde 11 1017 WT Amsterdam 
> +31(0)20-5224466 San Francisco - Hippo USA Inc. 101 H Street, 
> suite Q Petaluma CA
> 94952-3329 +1 (707) 773-4646
> 
> 
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of 
> > Irahhoten, Abdeslam
> > Sent: dinsdag 15 juli 2008 12:10
> > To: Hippo CMS development public mailinglist
> > Subject: RE: [HippoCMS-dev] search voor some text in pdf or 
> > word document
> > 
> > Hello ard, 
> > 
> > I have tried the following but I still can't search for some 
> > text inside the pdf documents
> > 
> > may be I still miss some configuration; can you tell me what 
> > exactly the problem is:
> > 
> > I have added this exractors
> > <extractor classname="org.apache.slide.extractor.PDFExtractor"
> > uri="/files/yourproject.preview/binaries"
> > content-type="application/pdf"/>
> > 
> > en then I have used the following in my dasl query 
> > <d:contains>${param.zoekwoorden}</d:contains>
> > 
> > when I'm looking for ${param.zoekwoorden} I see nothing
> > 
> > Thanks in advance
> > -----Oorspronkelijk bericht-----
> > Van: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] Namens Ard 
> Schrijvers
> > Verzonden: Monday, July 14, 2008 4:55 PM
> > Aan: Hippo CMS development public mailinglist
> > Onderwerp: RE: [HippoCMS-dev] search voor some text in pdf or 
> > word document
> > 
> > Ofcourse there is, otherwise I wouldn't dare calling the 
> system a cms
> > :-)
> > 
> > If you search with target binaries (or root) you can just 
> > type the search term in <d:contains> element.
> > 
> > In the repository you need to configure an extractor if not 
> > already done. See [1] for possible extractors.
> > 
> > There you can see how to configure for example MSWord, Excel, 
> > powerpoint, pdf etc etc. If you add them, reindexing has to be done.
> > This is simply done by removing the lucene index, but watch 
> > with this in production environment obviously.
> > 
> > Examples:
> > 
> > <extractor 
> classname="nl.hippo.slide.extractor.ImagePropertyExtractor"
> > uri="/files/yourproject.preview/binaries"/>
> > 
> > <extractor classname="nl.hippo.slide.extractor.OfficeExtractor"
> > uri="/files/yourproject.preview/binaries"
> > content-type="application/vnd.ms-excel">
> >     <configuration>
> >       <instruction property="author" 
> > namespace="http://hippo.nl/cms/1.0";
> > summary-information="4"/>
> >       <instruction property="application"
> > namespace="http://hippo.nl/cms/1.0"; summary-information="18"/>
> >       <instruction property="date" 
> namespace="http://hippo.nl/cms/1.0";
> > date-format="yyyyMMdd" summary-information="13"/>
> >       <instruction property="creationdate"
> > namespace="http://hippo.nl/cms/1.0"; date-format="yyyyMMdd"
> > summary-information="12"/>
> >       <instruction property="caption"
> > namespace="http://hippo.nl/cms/1.0"; summary-information="2"/>
> >     </configuration>
> >   </extractor>
> > 
> > <extractor classname="org.apache.slide.extractor.PDFExtractor"
> > uri="/files/yourproject.preview/binaries"
> > content-type="application/pdf"/>
> >   
> > -Ard
> > 
> > [1]
> > http://www.hippocms.org/display/CMS/4.+Hippo+Repository+Config
> > ure+Extrac
> > tors
> > 
> > > 
> > > Hello,
> > > 
> > >  
> > > 
> > > Is it may be possible (using a dasl query) to search for 
> some text 
> > > inside a pdf or word document
> > > 
> > >  
> > > 
> > > Thanks in advance
> > > 
> > > 
> > > Disclaimer
> > > 
> > > Dit bericht met eventuele bijlagen is vertrouwelijk en 
> uitsluitend 
> > > bestemd voor de geadresseerde. Indien u niet de bedoelde 
> ontvanger 
> > > bent, wordt u verzocht de afzender te waarschuwen en dit 
> > bericht met 
> > > eventuele bijlagen direct te verwijderen en/of te 
> > vernietigen. Het is 
> > > niet toegestaan dit bericht en eventuele bijlagen te 
> > vermenigvuldigen, 
> > > door te sturen, openbaar te maken, op te slaan of op andere 
> > wijze te 
> > > gebruiken. Ordina N.V. en/of haar groepsmaatschappijen 
> > accepteren geen 
> > > verantwoordelijkheid of aansprakelijkheid voor schade die 
> > voortvloeit 
> > > uit de inhoud en/of de verzending van dit bericht.
> > > 
> > > This e-mail and any attachments are confidential and are solely 
> > > intended for the addressee. If you are not the intended 
> recipient, 
> > > please notify the sender and delete and/or destroy this 
> message and 
> > > any attachments immediately.
> > > It is prohibited to copy, to distribute, to disclose or 
> to use this 
> > > e-mail and any attachments in any other way. Ordina N.V. 
> and/or its 
> > > group companies do not accept any responsibility nor 
> > liability for any 
> > > damage resulting from the content of and/or the 
> > transmission of this 
> > > message.
> > > ********************************************
> > > Hippocms-dev: Hippo CMS development public mailinglist
> > > 
> > > Searchable archives can be found at:
> > > MarkMail: http://hippocms-dev.markmail.org
> > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > > 
> > > 
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> > 
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > 
> > 
> > Disclaimer
> > 
> > Dit bericht met eventuele bijlagen is vertrouwelijk en 
> > uitsluitend bestemd voor de geadresseerde. Indien u niet de 
> > bedoelde ontvanger bent, wordt u verzocht de afzender te 
> > waarschuwen en dit bericht met eventuele bijlagen direct te 
> > verwijderen en/of te vernietigen. Het is niet toegestaan dit 
> > bericht en eventuele bijlagen te vermenigvuldigen, door te 
> > sturen, openbaar te maken, op te slaan of op andere wijze te 
> > gebruiken. Ordina N.V. en/of haar groepsmaatschappijen 
> > accepteren geen verantwoordelijkheid of aansprakelijkheid 
> > voor schade die voortvloeit uit de inhoud en/of de verzending 
> > van dit bericht.
> > 
> > This e-mail and any attachments are confidential and are 
> > solely intended for the addressee. If you are not the 
> > intended recipient, please notify the sender and delete 
> > and/or destroy this message and any attachments immediately. 
> > It is prohibited to copy, to distribute, to disclose or to 
> > use this e-mail and any attachments in any other way. Ordina 
> > N.V. and/or its group companies do not accept any 
> > responsibility nor liability for any damage resulting from 
> > the content of and/or the transmission of this message.
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> > 
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > 
> > 
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
> 
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> 
> 
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html


Disclaimer

Dit bericht met eventuele bijlagen is vertrouwelijk en uitsluitend bestemd voor 
de geadresseerde. Indien u niet de bedoelde ontvanger bent, wordt u verzocht de 
afzender te waarschuwen en dit bericht met eventuele bijlagen direct te 
verwijderen en/of te vernietigen. Het is niet toegestaan dit bericht en 
eventuele bijlagen te vermenigvuldigen, door te sturen, openbaar te maken, op 
te slaan of op andere wijze te gebruiken. Ordina N.V. en/of haar 
groepsmaatschappijen accepteren geen verantwoordelijkheid of aansprakelijkheid 
voor schade die voortvloeit uit de inhoud en/of de verzending van dit bericht.

This e-mail and any attachments are confidential and are solely intended for 
the addressee. If you are not the intended recipient, please notify the sender 
and delete and/or destroy this message and any attachments immediately. It is 
prohibited to copy, to distribute, to disclose or to use this e-mail and any 
attachments in any other way. Ordina N.V. and/or its group companies do not 
accept any responsibility nor liability for any damage resulting from the 
content of and/or the transmission of this message.
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to