Hi all,
  I'm looking at using elasticsearch for a use case that I'd love some 
feedback on regarding best practices. 

A little background... I've been digging into various approaches to 
allowing interactive drill down slicing dicing of activity stream data ( 
actor / verb / target ) user data for realtime analytics for end users. 
This is high dimensional data that has too many potential ways to view to 
effectively precompute rollups. Other systems out there that try to tackle 
this similar problem that I have played around with are Druid, OpenTSDB, 
Blueflood, InfluxDB. At the end of the day they either all use an inverted 
index or have or are planning to have elasticsearch integrations, so I 
figure why not stick with ES.

There are three areas I am trying to optimize:
- Minimize the index footprint on disk.
- Minimize the RAM footprint
- Maximize the speed

I believe the key tradeoff I need to make with my dataset is going to 
doc_values and whether or not I try to utilize heap or page cache.  

All my fields are straight exact match not analyzed fields and there are 
~15 of them. "not_analyzed" appears to have all the extras that can cause 
bloat disabled (norms, frequencies, etc). I am not indexing source. Here is 
my index template:
https://gist.github.com/ppearcy/fc5202a1664dbc90cbc2

With some test data, I'm getting pretty solid results. Average messages are 
~360bytes and I am getting:
- 60 bytes per without doc_values 
- 80 bytes per with doc values

On a test index with ~160million docs w/o doc values, I have it at 9.6GB of 
data with the file breakdown like so:
3.8G Jul 23 09:40 _mwf.fdt
3.9G Jul 23 10:32 _mwf_es090_0.tim
1.8G Jul 23 10:32 _mwf_es090_0.doc

Anybody know how I can slim things down any further or general advice when 
dealing with large numbers of small documents? 

Thanks!
Paul

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/58c823da-a493-4d46-b16f-dd3dfdb5960a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to