If you can do it by yourself and use Tika directly, I’d definitely do that and don’t use the mapper attachment plugin. You will have more control on what you exactly want to do than with the mapper attachment plugin.
My 2 cents -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> | @scrutmydocs <https://twitter.com/scrutmydocs> > Le 10 déc. 2014 à 12:42, Peter Bowyer <pe...@mapledesign.co.uk> a écrit : > > Hi list, > > I'm indexing a website which has a lot of files on it. > > I found the attachment plugin which handles all file types we have, but our > files are not "attached" (associated) with a particular web page -- in many > cases the same file is attached to multiple pages. So we want files to show > in the search results alongside other items. > > I can extract data from the file myself using Apache Tika and index it as > with any other document in the system; but given Tika runs inside the > attachment plugin, is there any way to use the built-in system? > > Thanks, > Peter > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com > <mailto:elasticsearch+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/fa853486-ef71-48f2-a711-998b792cfb35%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/fa853486-ef71-48f2-a711-998b792cfb35%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AD71C2AC-6257-4E39-8235-23A2C763B3B9%40pilato.fr. For more options, visit https://groups.google.com/d/optout.