Indexing Binary vs text

IronMan2014 Thu, 27 Mar 2014 12:09:52 -0700

I have couple of simple questions that I would like to clear up:

#1: For transportClient & cluster of two hosts: Do I have to add both hosts 
to the client, or is it enough to add just one of them and the yml(s) will 
take care of the clustering?


.addTransportAddress(new InetSocketTransportAddress(host[0], port))

.addTransportAddress(new InetSocketTransportAddress(host[1], port));



#2: Assume I have the following document structure:

jdoc{
  "title":"my title"
  "uid":"ux1234"
  "tags":"ES"
  "date":"1/1/2011"
  "content":"Content of doc goes here"
}


//This is for my Binary attachment for Binaries (PDF)

 putMappingResponse = new PutMappingRequestBuilder(
client.admin().indices() ).setIndices(INDEX_NAME).setType(INDEX_TYPE).
setSource(  

                                          XContentFactory.jsonBuilder().
startObject()

                                            .startObject(INDEX_TYPE)

                                            .startObject("properties")

                                              //pdf

                                                .startObject("file")

                                                                .field( 
"type", "attachment" )

                                                   .startObject("fields")

                                                       .startObject("title")

                                                           .field("store", 
"yes")

                                                       .endObject()

                                                       .startObject("file")

                                                           .field("store", 
"yes")

                                                           .field( 
"term_vector", "with_positions_offsets" )

                                                       .endObject()

                                                   .endObject()

                                                .endObject()

                                              .endObject()

                                            .endObject()

                                          .endObject()

                                      ).execute().actionGet();


void indexDocument(JSONObject jdoc){

   bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id(
jDoc.getString("uid")).source(jDoc.toString()));
}

void indexBinaryDocument(JSONObject jdoc){

XContentBuilder source = jsonBuilder().startObject()

                                         .field("file", jDoc.getString(
CONTENT)) //from tika Binary 64

                                         .field("uid",jDoc.getString(UID))

                                         .field("date",jDoc.getString(DATE))
                                         ....

                                        .endObject();

 bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).source
(source));
}


My Question:

Based on the document, I either call indexDocument for normal text docs or 
indexBinaryDocument. However, this is confusing, I want to be able to call 
one index function like "indexDocument" above without having to specify 
source again for binary, In other words, if the document is binary, why do 
I have to tell it about the "file" field again, couldn't I just replace the 
"content" field with the 64 base encoded text, everything else in the 
document is the same, only the content field is different? Somehow I feel 
both of should one of the same?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b59fd33-9d10-4b65-8b7a-f40d03bdbc83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexing Binary vs text

Reply via email to