You certainly can add a new field, and then just put the OCR text into that 
new field. So for example:

Mapping:

        PutMappingResponse putMappingResponse = new 
PutMappingRequestBuilder(
            
client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
                XContentFactory.jsonBuilder().startObject()
                    .field(DOCUMENT_TYPE).startObject()
                        .field("properties").startObject()
                            .field("text").startObject()
                                .field("type", "string")
                            .endObject()
                            .field("file").startObject()
                                .field("store", "yes")
                                .field("type", "attachment")
                                .field("fields").startObject()
                                    .field("file").startObject()
                                        .field("store", "yes")
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endObject()
                    .endObject()
                .endObject()
        ).execute().actionGet();

Then put the OCR text into the "text" field:

        IndexResponse indexResponse = client.prepareIndex(INDEX_NAME, 
DOCUMENT_TYPE, "1")
            .setSource(XContentFactory.jsonBuilder().startObject()
                .field("text", ocrText)
                .field("file").startObject()
                    .field("content", fileContents)
                    .field("_indexed_chars", -1)
                .endObject()
            .endObject()
        ).execute().actionGet();

You probably don't need to index the image binary information - not sure 
what you would need it for.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/89b4bdc6-b128-49af-b14d-93694dbb46d1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to