I think you'd better do this in your own processus and outside an elasticsearch node. You don't need to use mapper attachment and you can use directly Tika if you're a Java developer or any other library to extract content and metadata from it.
Actually, I did move the FSRiver from mapper attachment to Tika directly/ Now I have a fine control of my documents. Better than that, I'm not forced anymore to send over the wire a full PDF document (10Mb) which contains mainly pictures and extract only a small amount of data (metadata for example). Makes sense? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 17 février 2014 à 10:33:52, [email protected] ([email protected]) a écrit: Hi Would it possible to dedicate certain ElasticSearch client nodes to do only analyzing via mapper-attachment plugin? Afterwards the indexing should be performed on the data nodes. Goal would be offloading the nodes containing the indexes, as analyzing a lot of large documents consumes a lot of resources. Any thoughts or experiences will be very much appreciated. adTHANKSvance, Jan -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2c3d13b-f803-4f63-8a2e-ef70c93cfc90%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5301d8ad.7724c67e.f2%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/groups/opt_out.
