If you are only searching in the text you should index the images in
an other field field. With no analyzer ("index: not_analyzed"), or
even better "index: no" (not indexed). If you need to retrieve the
image data it's still in the _source.But to be honest I wouldn't even store this kind of information in ES, your index is going to be bigger, merges are going to be slower... I'd keep the binary files stored elsewhere. Cédric Hourcade [email protected] On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard <[email protected]> wrote: > Yes, I am applying "reuters" on my document (compose by text and picture). > My goal is to do my research on the text of the document with any word or > part of a word. > > Yes the problem it's my nGram filter. > How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an > other but who satisfy my goal ? > > Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : >> >> Does it mean your applying the "reuters" analyzer on your base64 >> encoded pictures? >> >> I guess it generates a really huge number of tokens for each entry >> because of your nGram filter (with a max at 250). >> >> Cédric Hourcade >> [email protected] >> >> >> On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard >> <[email protected]> wrote: >> > Information >> > My "note_source" contain picture (.jpg, .png ...) in base64 and text. >> > >> > For my mapping I have used : >> > "type" => "string" >> > "analyzer" => "reuteurs" (the name of my analyzer) >> > >> > >> > Any idea ? >> > >> > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : >> >> >> >> Hello >> >> I have some issue, when I index a particular data "note_source" (sql >> >> longtext). >> >> I use the same analyzer for each fields (except date_source and >> >> id_source) >> >> but for "note_source", I have a "warn monitor.jvm". >> >> When I remove "note_source", everything fine. If I don't use analyzer >> >> on >> >> "note_source", everything fine, but if I use my analyzer on >> >> "note_source" I >> >> have some crash. >> >> >> >> I think I have enough memory, I have used ES_HEAP_SIZE. >> >> Maybe my problem it's with accent (ascii, utf-8) >> >> >> >> Can you help me with this ? >> >> >> >> >> >> >> >> My Setting >> >> >> >> public function createSetting($pf){ >> >> $params = array('index' => $pf, 'body' => array( >> >> 'settings' => array( >> >> 'number_of_shards' => 5, >> >> 'number_of_replicas' => 0, >> >> 'analysis' => array( >> >> 'filter' => array( >> >> 'nGram' => array( >> >> "token_chars" =>array(), >> >> "type" => "nGram", >> >> "min_gram" => 3, >> >> "max_gram" => 250 >> >> ) >> >> ), >> >> 'analyzer' => array( >> >> 'reuters' => array( >> >> 'type' => 'custom', >> >> 'tokenizer' => 'standard', >> >> 'filter' => array('lowercase', 'asciifolding', >> >> 'nGram') >> >> ) >> >> ) >> >> ) >> >> ) >> >> )); >> >> $this->elasticsearchClient->indices()->create($params); >> >> return; >> >> } >> >> >> >> >> >> My Indexing >> >> >> >> public function indexTable($pf,$typeElement){ >> >> >> >> $params =array( >> >> "index" =>'_river', >> >> "type" => $typeElement, >> >> "id" => "_meta", >> >> "body" =>array( >> >> >> >> "type" => "jdbc", >> >> "jdbc" => array( >> >> "url" => "jdbc:mysql://ip/name", >> >> "user" => 'root', >> >> "password" => 'mdp', >> >> "index" => $pf, >> >> "type" => $typeElement, >> >> "sql" => select id_source as _id, id_sous_theme, >> >> titre_source, desc_source, note_source, adresse_source, type_source, >> >> date_source from source, >> >> "max_bulk_requests" => 5, >> >> ) >> >> ) >> >> >> >> ); >> >> >> >> >> >> $this->elasticsearchClient->index($params); >> >> } >> >> >> >> Thanks in advance. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "elasticsearch" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to [email protected]. >> > To view this discussion on the web visit >> > >> > https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com. >> > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPOf8kbDpr-EuDfskLj4UjQs4FAq04GrWH87fFy0df8EPQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
