You certainly can add a new field, and then just put the OCR text into that
new field. So for example:
Mapping:
PutMappingResponse putMappingResponse = new
PutMappingRequestBuilder(
client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
XContentFactory.jsonBuilder().startObject()
.field(DOCUMENT_TYPE).startObject()
.field("properties").startObject()
.field("text").startObject()
.field("type", "string")
.endObject()
.field("file").startObject()
.field("store", "yes")
.field("type", "attachment")
.field("fields").startObject()
.field("file").startObject()
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject()
.endObject()
).execute().actionGet();
Then put the OCR text into the "text" field:
IndexResponse indexResponse = client.prepareIndex(INDEX_NAME,
DOCUMENT_TYPE, "1")
.setSource(XContentFactory.jsonBuilder().startObject()
.field("text", ocrText)
.field("file").startObject()
.field("content", fileContents)
.field("_indexed_chars", -1)
.endObject()
.endObject()
).execute().actionGet();
You probably don't need to index the image binary information - not sure
what you would need it for.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89b4bdc6-b128-49af-b14d-93694dbb46d1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.