Re: problem indexing with my analyzer

Cédric Hourcade Fri, 20 Jun 2014 02:45:25 -0700

If you are only searching in the text you should index the images in
an other field field. With no analyzer ("index: not_analyzed"), or
even better "index: no" (not indexed). If you need to retrieve the
image data it's still in the _source.


But to be honest I wouldn't even store this kind of information in ES,
your index is going to be bigger, merges are going to be slower... I'd
keep the binary files stored elsewhere.

Cédric Hourcade
[email protected]


On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard
<[email protected]> wrote:
> Yes, I am applying "reuters" on my document (compose by text and picture).
> My goal is to do my research on the text of the document with any word or
> part of a word.
>
> Yes the problem it's my nGram filter.
> How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an
> other but who satisfy my goal ?
>
> Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit :
>>
>> Does it mean your applying the "reuters" analyzer on your base64
>> encoded pictures?
>>
>> I guess it generates a really huge number of tokens for each entry
>> because of your nGram filter (with a max at 250).
>>
>> Cédric Hourcade
>> [email protected]
>>
>>
>> On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard
>> <[email protected]> wrote:
>> > Information
>> > My "note_source" contain picture (.jpg, .png ...) in base64 and text.
>> >
>> > For my mapping I have used :
>> > "type" => "string"
>> > "analyzer" => "reuteurs" (the name of my analyzer)
>> >
>> >
>> > Any idea ?
>> >
>> > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :
>> >>
>> >> Hello
>> >> I have some issue, when I index a particular data "note_source" (sql
>> >> longtext).
>> >> I use the same analyzer for each fields (except date_source and
>> >> id_source)
>> >> but for "note_source", I have a "warn monitor.jvm".
>> >> When I remove "note_source", everything fine. If I don't use analyzer
>> >> on
>> >> "note_source", everything fine, but if I use my analyzer on
>> >> "note_source" I
>> >> have some crash.
>> >>
>> >> I think I have enough memory, I have used ES_HEAP_SIZE.
>> >> Maybe my problem it's with accent (ascii, utf-8)
>> >>
>> >> Can you help me with this ?
>> >>
>> >>
>> >>
>> >> My Setting
>> >>
>> >>  public function createSetting($pf){
>> >>         $params = array('index' => $pf, 'body' => array(
>> >>         'settings' => array(
>> >>             'number_of_shards' => 5,
>> >>             'number_of_replicas' => 0,
>> >>             'analysis' => array(
>> >>                 'filter' => array(
>> >>                     'nGram' => array(
>> >>                         "token_chars" =>array(),
>> >>                         "type" => "nGram",
>> >>                         "min_gram" => 3,
>> >>                         "max_gram"  => 250
>> >>                     )
>> >>                 ),
>> >>                 'analyzer' => array(
>> >>                     'reuters' => array(
>> >>                         'type' => 'custom',
>> >>                         'tokenizer' => 'standard',
>> >>                         'filter' => array('lowercase', 'asciifolding',
>> >> 'nGram')
>> >>                     )
>> >>                 )
>> >>             )
>> >>         )
>> >>         ));
>> >>         $this->elasticsearchClient->indices()->create($params);
>> >>         return;
>> >> }
>> >>
>> >>
>> >> My Indexing
>> >>
>> >> public function indexTable($pf,$typeElement){
>> >>
>> >>         $params =array(
>> >>             "index" =>'_river',
>> >>             "type" => $typeElement,
>> >>             "id" => "_meta",
>> >>             "body" =>array(
>> >>
>> >>                 "type" => "jdbc",
>> >>                 "jdbc" => array(
>> >>                     "url" => "jdbc:mysql://ip/name",
>> >>                     "user" => 'root',
>> >>                     "password" => 'mdp',
>> >>                     "index" => $pf,
>> >>                     "type" => $typeElement,
>> >>                     "sql" => select id_source as _id, id_sous_theme,
>> >> titre_source, desc_source, note_source, adresse_source, type_source,
>> >> date_source from source,
>> >>                     "max_bulk_requests" => 5,
>> >>                     )
>> >>             )
>> >>
>> >>         );
>> >>
>> >>
>> >>         $this->elasticsearchClient->index($params);
>> >> }
>> >>
>> >> Thanks in advance.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJQxjPOf8kbDpr-EuDfskLj4UjQs4FAq04GrWH87fFy0df8EPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Reply via email to