Sorry for the confusion - I do want PDFs, but I am concerned with the
retrieval of the image file when it ocr text is searched. I must be missing
something.
As showing below, I provide two fields "text" and the "content". In your
second post you say I don't need the "content' field for images? So, how
does the search return the image to the asking client "Web app" for
instance when a text match occurs with the image "ocr text"? If I only
include "text", then it will return the text part of the image only and not
the image, correct?
source(XContentFactory.jsonBuilder()
.startObject()
.field("text",ocrText) //extracted ocr
text from image
.field( "file").startObject()
.field("content", fileContents)
//content is the encoded base64string of the image file? is it needed?
.field("_indexed_chars", -1)
.endObject()
.endObject()
On Thursday, February 27, 2014 1:16:36 PM UTC-5, Binh Ly wrote:
>
> Oh, the attachment part is for your PDF. If you don't need to index PDFs
> then just remove that part:
>
> PutMappingResponse putMappingResponse = new
> PutMappingRequestBuilder(
> client.admin().indices()).
> setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
> XContentFactory.jsonBuilder().startObject()
> .field(DOCUMENT_TYPE).startObject()
> .field("properties").startObject()
> .field("text").startObject()
> .field("type", "string")
> .endObject()
> .endObject()
> .endObject()
> .endObject()
> ).execute().actionGet();
>
> Indexing:
>
> IndexResponse indexResponse = client.prepareIndex(INDEX_
> NAME, DOCUMENT_TYPE, "1")
> .setSource(XContentFactory.jsonBuilder().startObject()
> .field("text", ocrText)
> .endObject()
> ).execute().actionGet();
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35b9a36f-0a4e-4973-8c03-8d35f0af1a9f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.