Okay so I have a large amount of data 2 TB and its all microsoft office 
documents and pdfs and emails. What is the best way to go about indexing 
the body of these documents so making the contents of the document 
searchable. I tried to use the php client but that isn't helping and I know 
there are ways to convert files in php but is there nothing available that 
takes in these types of documents? I tried the file_get_contents function 
in php but it only takes in text documents. Also would you know of a good 
tool or a method to make the files that are searched downloadable?

Thanks,
Austin

On Thursday, March 12, 2015 at 12:26:13 PM UTC-5, [email protected] wrote:
>
> Yes you need to include all the text you want indexed and searchable as 
> part of the JSON.
>
> How else would you expect ElasticSearch to receive the data?
>
> Regarding large scale production environments, this is why ElasticSearch 
> scales out.
>
> Aaron
>
> On Wednesday, March 11, 2015 at 12:50:25 PM UTC-6, Austin Harmon wrote:
>>
>> Hello,
>>
>> I'm trying to get an understand of the how to have full text search on 
>> the document and have the body of the document be considered during search. 
>> I understand how to do the mapping and use analyzers but what I don't 
>> understand is how they get the body of the document. If your fields are 
>> file name, file size, file path, file type how do the analyzers get the 
>> body of the document. Surely you wouldn't have to put the body of every 
>> document into the JSON, that is how I've seen it done in all the examples 
>> I've seen but that doesn't make sense for large scale production 
>> environments. If someone could please give me some  insight as to how this 
>> process works it would be greatly appreciated.
>>
>> Thank you,
>> Austin Harmon
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/41516b36-18e3-4ef8-8d8d-1e9da6b727a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to