Hey,

I am designing solution for indexing using hadoop.
I think to use same logic of LogStash to create index per period of time of 
my records (10 days or Month) , in order to avoid working with big index 
sizes(from experience - merge of huge fragments in lucene make whole index 
being slow) and also that way I don't limit myself to certain amount of 
shards, I will be able to modify period dynamically and move indexes 
between nodes in the cluster...

So I though writing in elasticsearch-hadoop option of extracting indexName 
from value object -  or even use the key for index name, then holding 
RestRepository object per index name, that will buffer bulks per index and 
send them when bulk is full or hadoop job ends

Another option just write in the bulk index name + type, and send bulk to 
master ES node (not take shards list of certain index and choose one shard 
depending on instance of hadoop)
(but in that scenario I think that master ES node will work too hard 
because many mappers/reducers will write to same node and it will need to 
route those index records one by one...)

Who worked with elasticsearch-hadoop code - I would like to receive inputs 
- what do you think? what better?

Thanks,
Igor


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/696de734-e97e-4cb5-ae80-5fa8717b6190%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to