Re: ingest performance degrades sharply along with the documents having more fileds

Michael McCandless Mon, 23 Jun 2014 21:45:25 -0700

Hi Cindy,

There isn't a hard limit on the number of field Lucene supports, it's more
than per-field there is highish heap used, added CPU/IO cost for merging,
etc.  It's just not a well tested usage of Lucene, not something the
developers focus on optimizing, etc.


Partitioning by _type won't change things (it's still a single Lucene
index).

How you design your schema really depends on how you want to search on
them.  E.g. if these are single-token text fields that you need to filter
on then you can index them all under a single field (say allFilterFields),
pre-pending your original field name onto each token, and then at search
time doing the same (searching for field:text as your text token within
allFilterFields).


Mike McCandless

http://blog.mikemccandless.com


On Tue, Jun 24, 2014 at 12:12 AM, Cindy Hsin <[email protected]> wrote:

> Thanks!
>
> I have asked Maco to re-test ES with these two parameter disabled.
>
> One more question regard Lucene's capability with large amount of metadata
> fields. What is the largest meta data fileds Lucene supports per Index?
> What are different strategy to solve the large metadata fields issue? Do
> you recommend to use "type" to partition different set of meta data fields
> within a index?
> I will clarify with our team regard their usage for large meta data fields
> as well.
>
>
> Thanks!
> Cindy
>
> On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote:
>
>> I try to measure the performance of ingesting the documents having lots
>> of fields.
>>
>>
>> The latest elasticsearch 1.2.1:
>> Total docs count: 10k (a small set definitely)
>> ES_HEAP_SIZE: 48G
>> settings:
>> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA"
>> ,"number_of_replicas":"0","translog":{"disable_flush":"
>> true"},"number_of_shards":"5","refresh_interval":"-1","
>> version":{"created":"1020199"}}}}}
>>
>> mappings:
>> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{"
>> mapping":{"store":false,"norms":{"enabled":false},"
>> type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store":
>> false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{"
>> store":false,"type":"integer"},"match":"*_i"}}],"_source":{"
>> enabled":false},"properties":{}}}}}
>>
>> All fields in the documents mach the templates in the mappings.
>>
>> Since I disabled the flush & refresh, I submitted the flush command
>> (along with optimize command after it) in the client program every 10
>> seconds. (I tried the another interval 10mins and got the similar results)
>>
>> Scenario 0 - 10k docs have 1000 different fields:
>> Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used
>> heap memory).
>>
>>
>> Scenario 1 - 10k docs have 10k different fields(10 times fields compared
>> with scenario0):
>> This time ingestion took 29 secs.   Only 5.74G heap mem is used.
>>
>> Not sure why the performance degrades sharply.
>>
>> If I try to ingest the docs having 100k different fields, it will take 17
>> mins 44 secs.  We only have 10k docs totally and not sure why ES perform so
>> badly.
>>
>> Anyone can give suggestion to improve the performance?
>>
>>
>>
>>
>>
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/8c5874cd-a1ff-432b-9bdf-e8a54a505fcb%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/8c5874cd-a1ff-432b-9bdf-e8a54a505fcb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRectTyYXUBJPW7Li6pK7WT9mOguODLwY2X%3DDK6Js_cMsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: ingest performance degrades sharply along with the documents having more fileds

Reply via email to