Re: Elasticsearch on ZFS best practice

Patrick Proniewski Wed, 21 May 2014 05:26:25 -0700

Hi Jörg,

On 21 mai 2014, at 13:49, [email protected] wrote:


> - estimating ES process memory really depends on individual requirements
> (bulk indexing, field cache, filter/facet, concurrent queries) - just take
> a portion of your data, measure memory/CPU/disk I/O, and extrapolate - best
> is to add nodes if resources get tight. Rule of thumb is 50% of RAM to ES
> heap


I'm not really sure to understand what you mean by "just take a portion of your 
data". Am I supposed to make a query in kibana that will return a known amount 
of data, and measure mem/cpu/io during the request, then extrapolate to get the 
amount of those resources needed to return all my data?


> - you are correct, primarycache=all may buffer more data than required
> (useful for maximum ZFS performance). You have already limited the ARC
> size. Use mmapfs for ES store, this should work best with
> primarycache=metadata

Ok, I was mistaken about mmapfs, I've read some documentation and now it looks 
a bit clearer to me.


> - ZFS recordsize for JVM apps like ES should be default which is 4k. Also
> with ES, important is to match ZFS recordsize with kernel page size and
> sector size of the drive so there is no skew in the number of I/O
> operations. Check for yourself if higher values like 8k /16k / 64k / 256k
> gets better throughput on ES data folder. On certain striped HW RAID
> devices it may be the case, but I doubt it (ZFS internal buffering is
> compensating for this effect, write throughput will suffer if recordsize is
> too high)


My FS is (should be?) properly aligned on the physical 4K block HDD, so it 
should be quite efficient to move to a 4k blocksize ZFS volume if it's best for 
ES.
I'll make some measurements of I/O to make sure performances are not going down.
Every page size is 4k (FreeBSD 9.x): 

$ sysctl -a | egrep page_?size:
vm.stats.vm.v_page_size: 4096
hw.pagesize: 4096
p1003_1b.pagesize: 4096

> - and you should switch off atime on ES data folder

I can do that too.

Thank you for your reply.



> On Tue, May 13, 2014 at 7:39 AM, Patrick Proniewski <
> [email protected]> wrote:
> 
>> Hello,
>> 
>> I'm running an Elasticsearch node on a FreeBSD server, on top of ZFS
>> storage. For now I've considered that ES is smart and manages its own
>> cache, so I've disabled primary cache for data, leaving only metadata being
>> cacheable. Last thing I want is to have data cached twice, one time is ZFS
>> ARC and a second time in application's own cache. I've also disabled
>> compression:
>> 
>> $ zfs get compression,primarycache,recordsize  zdata/elasticsearch
>> NAME                 PROPERTY      VALUE         SOURCE
>> zdata/elasticsearch  compression   off           local
>> zdata/elasticsearch  primarycache  metadata      local
>> zdata/elasticsearch  recordsize    128K          default
>> 
>> It's a general purpose server (web, mysql, mail, ELK, etc.). I'm not
>> looking for absolute best ES performance, I'm looking for best use of my
>> resources.
>> I have 16 GB RAM, and I plan to put a limit to ARC size (currently
>> consuming 8.2 GB RAM) so I can mlockall ES memory. But I don't think I'll
>> go the RAM-only storage route (<
>> http://jprante.github.io/applications/2012/07/26/Mmap-with-Lucene.html>)
>> as I'm running only one node.
>> 
>> How can I estimate the amount of memory I must allocate to ES process?
>> 
>> Should I switch primarycache=all back on despite ES already caching data?
>> 
>> What is the best ZFS record/block size to accommodate Elasticsearch/Lucene
>> IOs?
>> 
>> Thanks,
>> Patrick
>> 
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/FBBA84AE-D610-4060-AFBC-FC7D5BA0803F%40patpro.net
>> .
>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFeK_eTvLSEZ3BGgQGmWEzX5Y4v2AdWo8KZoywVe48zBg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/EAB4803E-940C-4DAD-8C29-CBEBB9BCE7CA%40patpro.net.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch on ZFS best practice

Reply via email to