Binh, Thanks, With your help I think I am closer to the answer. Wih the
sample mapping you provided, I should be able to provide the base 64
contents of the image file as the "contents" field, and the ocrtext as
"text field. So, when the ocr text is searched, i can return the "content"
which is the image. With the above mapping I believe the image is saved in
the _source as well as the field for "highlighting " purposes, Can I
prevent it from being stored in _source by something like this?
startObject("_source").field("enabled","no").endObject()
On Thursday, February 27, 2014 8:29:25 AM UTC-5, Binh Ly wrote:
>
> You certainly can add a new field, and then just put the OCR text into
> that new field. So for example:
>
> Mapping:
>
> PutMappingResponse putMappingResponse = new
> PutMappingRequestBuilder(
>
> client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
> XContentFactory.jsonBuilder().startObject()
> .field(DOCUMENT_TYPE).startObject()
> .field("properties").startObject()
> .field("text").startObject()
> .field("type", "string")
> .endObject()
> .field("file").startObject()
> .field("store", "yes")
> .field("type", "attachment")
> .field("fields").startObject()
> .field("file").startObject()
> .field("store", "yes")
> .endObject()
> .endObject()
> .endObject()
> .endObject()
> .endObject()
> .endObject()
> ).execute().actionGet();
>
> Then put the OCR text into the "text" field:
>
> IndexResponse indexResponse = client.prepareIndex(INDEX_NAME,
> DOCUMENT_TYPE, "1")
> .setSource(XContentFactory.jsonBuilder().startObject()
> .field("text", ocrText)
> .field("file").startObject()
> .field("content", fileContents)
> .field("_indexed_chars", -1)
> .endObject()
> .endObject()
> ).execute().actionGet();
>
> You probably don't need to index the image binary information - not sure
> what you would need it for.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a7db1379-5161-4f7d-ab78-a683c8beb07d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.