I’m a bit concerned about your « it does not work » statement. We have only today 4 opened issues on it: https://github.com/elastic/elasticsearch-mapper-attachments/issues <https://github.com/elastic/elasticsearch-mapper-attachments/issues> 1 bug and 3 feature requests.
Could you explain a bit more what is not working? May be I missed something. -- David Pilato - Developer | Evangelist Elasticsearch.com @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> | @scrutmydocs <https://twitter.com/scrutmydocs> > Le 13 mars 2015 à 10:49, Austin Harmon <[email protected]> a écrit : > > There is a plugin called mapper attachments: > https://github.com/elastic/elasticsearch-mapper-attachments This plugin is > supposed to use Tika to index the content of documents but it doesn't seem to > be working correctly. I base64 encode the documents but it comes back as null > when I decode it. > On Friday, March 13, 2015 at 11:38:38 AM UTC-5, Aaron Mefford wrote: > Not certain what you are referring to so I expect not. I have used the > elasticsearch mappings, but I cant see how those would directly integrate > with Tika. > > On Fri, Mar 13, 2015 at 10:35 AM, Austin Harmon <[email protected] > <javascript:>> wrote: > Thank you for the information. This going to be very difficult I can tell. Do > you have experience with the mapper attachment? > > On Friday, March 13, 2015 at 11:15:18 AM UTC-5, Aaron Mefford wrote: > Your going to have the same issue with SOLR, putting the contents in to XML > which is even heavier than JSON. > > I wish that I had some more experience using Tika, I do not. I am aware of > its capabilities but have not had reason to myself. > > I see what you are saying about others not having the same issue, but what > you must realize is that most users are not indexing that type of document. > They are indexing events, database records, web pages and so on. It is a > very small subset that index things like word docs and pdfs. > > On Fri, Mar 13, 2015 at 9:42 AM, Austin Harmon <[email protected] <>> wrote: > Thank you for the information. I've been trying to use the mapper attachment > which has Apache Tika built into it. I am just surprised and confused that so > many companies use elasticsearch but yet it is so difficult to index the > contents of a document. If I need to index the contents of documents then > would it be easier and more efficient to switch over to Apache Solr? As I > said I have 2TB of data so it isn't efficient for me to manually input each > document so it can be indexed with specific JSON. If you have any experience > with Solr please let me know if it would be a good solution to my problem. > > thanks, > Austin > > On Thursday, March 12, 2015 at 4:04:29 PM UTC-5, Aaron Mefford wrote: > Take a look at Apache Tika http://tika.apache.org/ > <http://www.google.com/url?q=http%3A%2F%2Ftika.apache.org%2F&sa=D&sntz=1&usg=AFQjCNFq7mCziZJJYGt9JOe_w89GwPFWng>. > It will allow you to extract the contents of the documents for indexing, > this is outside of the scope of the ElasticSearch indexing. A good tool to > make these files downloadable is also out of scope, but I'll answer to what > is in scope. You need to put the files some where that they can be accessed > by a URL. Any webserver is capable of this, of course your needs may very > but this isnt the list for those questions. Once you have a URL that the > document can be accessed by, include that in your indexing of the document so > that you can point to that URL in your search results. > > I am sure there are other options out there for extracting the contents of > word documents, Apache Tika is one that is frequently used for this purpose > though. > > On Thu, Mar 12, 2015 at 2:56 PM, Austin Harmon <[email protected] <>> wrote: > Okay so I have a large amount of data 2 TB and its all microsoft office > documents and pdfs and emails. What is the best way to go about indexing the > body of these documents so making the contents of the document searchable. I > tried to use the php client but that isn't helping and I know there are ways > to convert files in php but is there nothing available that takes in these > types of documents? I tried the file_get_contents function in php but it only > takes in text documents. Also would you know of a good tool or a method to > make the files that are searched downloadable? > > Thanks, > Austin > > > On Thursday, March 12, 2015 at 12:26:13 PM UTC-5, [email protected] <> wrote: > Yes you need to include all the text you want indexed and searchable as part > of the JSON. > > How else would you expect ElasticSearch to receive the data? > > Regarding large scale production environments, this is why ElasticSearch > scales out. > > Aaron > > On Wednesday, March 11, 2015 at 12:50:25 PM UTC-6, Austin Harmon wrote: > Hello, > > I'm trying to get an understand of the how to have full text search on the > document and have the body of the document be considered during search. I > understand how to do the mapping and use analyzers but what I don't > understand is how they get the body of the document. If your fields are file > name, file size, file path, file type how do the analyzers get the body of > the document. Surely you wouldn't have to put the body of every document into > the JSON, that is how I've seen it done in all the examples I've seen but > that doesn't make sense for large scale production environments. If someone > could please give me some insight as to how this process works it would be > greatly appreciated. > > Thank you, > Austin Harmon > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe > <https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] <>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/41516b36-18e3-4ef8-8d8d-1e9da6b727a4%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/41516b36-18e3-4ef8-8d8d-1e9da6b727a4%40googlegroups.com?utm_medium=email&utm_source=footer>. > > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe > <https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] <>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/7f0de88e-db25-47f8-bdfd-9e1e51d7a0f6%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/7f0de88e-db25-47f8-bdfd-9e1e51d7a0f6%40googlegroups.com?utm_medium=email&utm_source=footer>. > > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe > <https://groups.google.com/d/topic/elasticsearch/mG2k23vbzXQ/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/e904808f-0e66-44c2-980c-bc3a0af22951%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/e904808f-0e66-44c2-980c-bc3a0af22951%40googlegroups.com?utm_medium=email&utm_source=footer>. > > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/0b4f70b8-bcd7-4c66-ad72-b0a478332e36%40googlegroups.com > > <https://groups.google.com/d/msgid/elasticsearch/0b4f70b8-bcd7-4c66-ad72-b0a478332e36%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/169B3761-629C-471D-9E97-07EA75473F7E%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
