Hi list, We are dumping about 100M ~3KB documents a day into ElasticSearch 1.4.1 and indexing all fields (of which there are a few dozen). From a read perspective we perform dynamic queries which may return many results (all of which may be relevant, we don't use scoring) so we want to keep the _source field.
Obviously, this is taking a toll on our disk usage and we'd like to reduce that. Questions: 1. Is it possible for me to index JSON but set the _source field myself? I would shove a protobuf or something similar on insert and on query I would revert it back to JSON 2. I understand that _source is compressed, but I assume every document is compressed separately (our small documents don't benefit from that). Is there a way to somehow compress "across" documents to take advantage of the fact that our documents are extremely similar to one another? 3. Any other ideas? Thanks, Eran -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0588de1-b0e4-468f-ad25-b74f4abb444d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
