Yes. Some metadata are extracted with Tika.

As you said, you should do that operation before indexation (means only index 
what you really need).

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 janv. 2014 à 22:51, ZenMaster80 <[email protected]> a écrit :

> Thanks for the reply. the attachment plugin I understand encodes content 
> before indexing it, this sounds like an expensive operation if we have lots 
> of pdfs. I was thinking extracting text from pdf early on instead and deal 
> with text instead.
> Does the plugin also work for binaries like images?
> 
> On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote:
>> 
>> You can use Tika by yourself (recommended). See how I did it in fsriver 
>> project.
>> You can use mapper attachment plugin which is using Tika behind the scene 
>> but gives you less control IMHO.
>> 
>> About versions, elasticsearch does not keep old versions around. If you need 
>> that, you have to manage it yourself.
>> 
>> HTH
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>> Le 16 janv. 2014 à 20:42, ZenMaster80 <[email protected]> a écrit :
>> 
>>> - Is there any literature on how to index pdf documents and binary formats 
>>> like images?
>>> - Versioning question: If I update an already indexed document, I believe 
>>> ES will update the version number. I am wondering if it keeps the previous 
>>> document, what if I needed access to the previous document?
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6CD3EB4F-93DD-48BD-98F7-D14E3FDA88CA%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to