Hi, I use Elasticsearch with very simple schema. Only one date field is indexed. Some of document also contain a couple of single-term string fields which are also indexed. Index contain 10 unique string fields.
Moreover I have about 500 different numeric fields. I don't index this numeric fields but I store doc values for all of these numeric fields. Average document contains 5-7 different numeric fields. When I'm ingesting data to the index on 4 CPU-core machine I end up with 4,000 document adds per second. There are no document updates. Index is append only. I changed merge policy to use 30 segments per tier. Moreover I reduced the index maximum segment size to 500MB. None of this operations helped to improve ingestion rate. I realized that ingestion process is CPU-bound. I used SPM on-demand profiler (https://sematext.com/blog/2016/03/17/on-demand-java-profiling/) to find hot methods. Most of CPU time is spent in DocValues related methods (SingletonSortedNumericDocValues#setDocument, DocValuesConsumer$10$1#next, DocValuesConsumer#isSingleValued, DocValuesConsumer$4$1#setNext, ...). More than 50% CPU computation power is used for merging Doc Values all the time. Is it possible to improve performance of doc values building process? Why doc values storing is so expensive? -- Paweł