Custom _source compression / compaction to reduce disk usage

Eran Duchan Mon, 15 Dec 2014 06:20:33 -0800

Hi list,

We are dumping about 100M ~3KB documents a day into ElasticSearch 1.4.1 and 
indexing all fields (of which there are a few dozen). From a read 
perspective we perform dynamic queries which may return many results (all 
of which may be relevant, we don't use scoring) so we want to keep the 
_source field.


Obviously, this is taking a toll on our disk usage and we'd like to reduce 
that. Questions:

   1. Is it possible for me to index JSON but set the _source field myself? 
   I would shove a protobuf or something similar on insert and on query I 
   would revert it back to JSON
   2. I understand that _source is compressed, but I assume every document 
   is compressed separately (our small documents don't benefit from that). Is 
   there a way to somehow compress "across" documents to take advantage of the 
   fact that our documents are extremely similar to one another?
   3. Any other ideas?

Thanks,
Eran

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f0588de1-b0e4-468f-ad25-b74f4abb444d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Custom _source compression / compaction to reduce disk usage

Reply via email to