Yes. Some metadata are extracted with Tika. As you said, you should do that operation before indexation (means only index what you really need).
-- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 16 janv. 2014 à 22:51, ZenMaster80 <[email protected]> a écrit : > Thanks for the reply. the attachment plugin I understand encodes content > before indexing it, this sounds like an expensive operation if we have lots > of pdfs. I was thinking extracting text from pdf early on instead and deal > with text instead. > Does the plugin also work for binaries like images? > > On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote: >> >> You can use Tika by yourself (recommended). See how I did it in fsriver >> project. >> You can use mapper attachment plugin which is using Tika behind the scene >> but gives you less control IMHO. >> >> About versions, elasticsearch does not keep old versions around. If you need >> that, you have to manage it yourself. >> >> HTH >> >> -- >> David ;-) >> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs >> >> Le 16 janv. 2014 à 20:42, ZenMaster80 <[email protected]> a écrit : >> >>> - Is there any literature on how to index pdf documents and binary formats >>> like images? >>> - Versioning question: If I update an already indexed document, I believe >>> ES will update the version number. I am wondering if it keeps the previous >>> document, what if I needed access to the previous document? >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6CD3EB4F-93DD-48BD-98F7-D14E3FDA88CA%40pilato.fr. For more options, visit https://groups.google.com/groups/opt_out.
