Hi Cindy, There isn't a hard limit on the number of field Lucene supports, it's more than per-field there is highish heap used, added CPU/IO cost for merging, etc. It's just not a well tested usage of Lucene, not something the developers focus on optimizing, etc.
Partitioning by _type won't change things (it's still a single Lucene index). How you design your schema really depends on how you want to search on them. E.g. if these are single-token text fields that you need to filter on then you can index them all under a single field (say allFilterFields), pre-pending your original field name onto each token, and then at search time doing the same (searching for field:text as your text token within allFilterFields). Mike McCandless http://blog.mikemccandless.com On Tue, Jun 24, 2014 at 12:12 AM, Cindy Hsin <[email protected]> wrote: > Thanks! > > I have asked Maco to re-test ES with these two parameter disabled. > > One more question regard Lucene's capability with large amount of metadata > fields. What is the largest meta data fileds Lucene supports per Index? > What are different strategy to solve the large metadata fields issue? Do > you recommend to use "type" to partition different set of meta data fields > within a index? > I will clarify with our team regard their usage for large meta data fields > as well. > > > Thanks! > Cindy > > On Thursday, June 12, 2014 10:57:23 PM UTC-7, Maco Ma wrote: > >> I try to measure the performance of ingesting the documents having lots >> of fields. >> >> >> The latest elasticsearch 1.2.1: >> Total docs count: 10k (a small set definitely) >> ES_HEAP_SIZE: 48G >> settings: >> {"doc":{"settings":{"index":{"uuid":"LiWHzE5uQrinYW1wW4E3nA" >> ,"number_of_replicas":"0","translog":{"disable_flush":" >> true"},"number_of_shards":"5","refresh_interval":"-1"," >> version":{"created":"1020199"}}}}} >> >> mappings: >> {"doc":{"mappings":{"type":{"dynamic_templates":[{"t1":{" >> mapping":{"store":false,"norms":{"enabled":false}," >> type":"string"},"match":"*_ss"}},{"t2":{"mapping":{"store": >> false,"type":"date"},"match":"*_dt"}},{"t3":{"mapping":{" >> store":false,"type":"integer"},"match":"*_i"}}],"_source":{" >> enabled":false},"properties":{}}}}} >> >> All fields in the documents mach the templates in the mappings. >> >> Since I disabled the flush & refresh, I submitted the flush command >> (along with optimize command after it) in the client program every 10 >> seconds. (I tried the another interval 10mins and got the similar results) >> >> Scenario 0 - 10k docs have 1000 different fields: >> Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used >> heap memory). >> >> >> Scenario 1 - 10k docs have 10k different fields(10 times fields compared >> with scenario0): >> This time ingestion took 29 secs. Only 5.74G heap mem is used. >> >> Not sure why the performance degrades sharply. >> >> If I try to ingest the docs having 100k different fields, it will take 17 >> mins 44 secs. We only have 10k docs totally and not sure why ES perform so >> badly. >> >> Anyone can give suggestion to improve the performance? >> >> >> >> >> >> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/8c5874cd-a1ff-432b-9bdf-e8a54a505fcb%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/8c5874cd-a1ff-432b-9bdf-e8a54a505fcb%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRectTyYXUBJPW7Li6pK7WT9mOguODLwY2X%3DDK6Js_cMsg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
