No idea how many shards you need.  Try 10, 15, 20, 25 and see how the
numbers coming from YCSB and system stat change.

Wow, that github repo hasn't been touched in 3 years.  Elasticsearch and
the java client for Elasticsearch have probably changes a bit since then,
so be careful about what you read into the results you interpret from this
test.

I'd spend some time understanding mappings in Elasticsearch index
settings.  If you aren't going to query by the 10 secondary fields, and
just want to be able to do key/value retrievals in your YCSB test, it would
probably make your load test more realistic.  Full text indexing purely
random values isn't a realistic workload compared to real data.    Here's a
quick code sample for sense that shows an example of turning off a few
things to prevent indexing stuff you don't need for YCSB

https://gist.github.com/derickson/61d66ed7d6a911db9cf0

If you really want to know if ES is a good fit, I'd recommend doing bulk
loads of 10 GB of real data




Dave Erickson / Elastic / Solutions Architect / Washington, DC
mobile: +1 401 965 4130

On Thu, Apr 23, 2015 at 10:52 AM, Milind Shah <milinds...@gmail.com> wrote:

> Thank you all for your inputs. I am working with Brian on this exercise as
> well. Let me try answering some of the questions.
>
> CPU usage:
> There are only 2 cores of CPU in use. When I monitor the disk usage, I
> notice that for every 2 minutes or so, the disk usage goes close to 100%
> for about 10 seconds and for rest of the time it remains below 10%. We have
> set ES_HEAP_SIZE to 30gb and the machines have 128gb RAM available.
>
> YCSB:
> YCSB is generating 23 bytes key with long integer and 'user' as prefix
> (ex. user9348485929) and for data it is generating random bytes to fill up
> the 10 fields. All the keys that we're inserting are unique keys. YCSB does
> single index operation.
>
> Here is the insert code of YCSB (
> https://github.com/brianfrankcooper/YCSB/blob/master/elasticsearch/src/main/java/com/yahoo/ycsb/db/ElasticSearchClient.java
> ):
>
> public int insert(String table, String key, HashMap<String, ByteIterator>
> values) { try { final XContentBuilder doc = jsonBuilder().startObject();
> for (Entry<String, String> entry : StringByteIterator.getStringMap(values)
> .entrySet()) { doc.field(entry.getKey(), entry.getValue()); } doc.
> endObject(); client.prepareIndex(indexKey, table, key) .setSource(doc)
> .execute() .actionGet(); return 0; } catch (Exception e) { e.
> printStackTrace(); } return 1; }
>
>
> Over goal is to write more data (1-2 TB) later on to the same index. The
> 10GB insert is just to have ES tuned for workload. For this usecase, what
> would be a recommended # of shards? Is there a data to number of shards
> ratio that we should keep in mind while going forward?
>
>
> Thanks,
> Milind
>
> On Thursday, April 23, 2015 at 5:00:56 AM UTC-7, da...@elastic.co wrote:
>>
>> Just some thoughts.  Yeah, with 16 cores per machine and 10 machines
>> having 5 shards per index is probably too low.
>>
>> What are your system metrics telling you?  Are the CPUs idle?  What does
>> the CPU I/O wait look like?
>> Are you doing single index operations or batch index operations with YCSB?
>>
>> Another thing to think about.  YCSB was built to test the key/value
>> performance properties of a database.  If I remember correctly the values
>> put into the strings are randomly generated.  Pure random is about the
>> worst case possible for cardinality when it comes to full text indexing
>> data structures, so you might want to adjust for that when creating your
>> mapping for the index.  If the values are pure random rather than randomly
>> pulled from a dictionary of fixed length (English only has 200k or so
>> words) then the data you are putting in may be penalizing ES for having
>> indexing features turned on by default.
>>
>>
>> On Thursday, April 23, 2015 at 5:25:30 AM UTC-4, Michael McCandless wrote:
>>>
>>> You can try the ideas here too:
>>> https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing
>>>
>>> Mike McCandless
>>>
>>> On Wed, Apr 22, 2015 at 8:00 PM, Kimbro Staken <kst...@kstaken.com>
>>> wrote:
>>>
>>>> Hello Brian,
>>>>
>>>> Many things will affect the rate of ingest, the biggest one is making
>>>> sure the load gets spread around. But are you sure ES is what's
>>>> bottlenecking here? With only 5 shards you're only using half your cluster
>>>> but I'm willing to bet your 20 threads on the importer isn't maxing that
>>>> out. Also you need to make sure the import process is spreading connections
>>>> across the nodes otherwise you may be limited in other ways by the node
>>>> you're connecting to. Also make sure the client is using bulk requests and
>>>> experiment with the bulk sizes.
>>>>
>>>> FYI, I've been testing a new system configuration using an 8 core
>>>> Avoton CPU with 6 x SSDs in a RAID 0. On this system (single node) ingest
>>>> can sustain around 3,500 docs/sec of similar size to your load before it
>>>> becomes CPU bound. You have much more CPU capacity so I would expect your
>>>> hardware to be able to exceed this by a fair margin, your current numbers
>>>> don't show that.
>>>>
>>>>
>>>> Kimbro Staken
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 22, 2015 at 4:16 PM, <bpar...@maprtech.com> wrote:
>>>>
>>>>> We are running a 10-node Elasticsearch 1.4.2 cluster, and getting
>>>>> cluster wide throughput of 18161 docs/sec, or about 18MB/sec.  We'd like 
>>>>> to
>>>>> improve this as much as we can, without impacting query times too much.
>>>>>
>>>>> Our hardware:
>>>>>
>>>>> RAM: 128GB
>>>>> Disks: 8 disks, 7200 RPM, 1TB in a RAID 0 array
>>>>> CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz - 16 physical cores, 32
>>>>> HT cores
>>>>> Network: 1x10gbe
>>>>>
>>>>> They are running CentOS 6.5, and java version 1.7.0_67.  We're setting
>>>>> the Elasticsearch heap size to 30GB.
>>>>>
>>>>> We are testing ingest by inserting 10GB of data with YCSB.  Document
>>>>> sizes are 1KB, with 10 string fields, each 100 bytes.  There is 1 YCSB
>>>>> client with 20 threads, writing to a single index with 5 shards and 0
>>>>> replicas.  YCSB connects using the Java Node Client.
>>>>>
>>>>> What is the expected ingest rate in this type of environment?  What
>>>>> parameters are recommended to increase the ingest rate?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Brian
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/5733e9d0-b877-4dc3-b5c1-d341365ec6b2%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/5733e9d0-b877-4dc3-b5c1-d341365ec6b2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/CAA0DmXbAOBrWgb6O0PHD3LgBiqfckSnifAGVura1%2BQ05f1d-LA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAA0DmXbAOBrWgb6O0PHD3LgBiqfckSnifAGVura1%2BQ05f1d-LA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/x1hYRTO7znQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6bb52b49-4656-4c67-beac-ff817837a6ab%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/6bb52b49-4656-4c67-beac-ff817837a6ab%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJMkaiZF-%3DTg03dfu-rApZt%2BxpzEgRbzAVv%2Bu504z9j-kakRHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to