Typically one would use es-hadoop if they are already Hadoop users. As for your
questions:
1. Yes and no. To search data one has to index but not necessarily store the data. For convenience so that the data is
returned along with the results, let's assume the worst case scenario where data is stored as well. However one would
have to do so as well even use a pure Hadoop implementation - letting aside the fact that one would have to write the
search algos using Map/Reduce which is not at all easy (think Geolocation) - all the intermediate steps and keys (think
shuffling, key/output values) between input and output, would be saved to disk which results in data being duplicated on
_each_ job.
Elasticseach aside, for data to be useable, searchable, indexed, etc... there needs to be some metadata - this is either
packed with the data or created along the way. Since you mentioned HBase and Pig, take a look at their requirements.
2. Yes, es-hadoop is bidirectional so one can stream data in ES to from HDFS for example or stream data from ES to HDFS.
However while ES can be used as a store, it's much more valuable if you use it for its search/insight capabilities hence
why typically one would read search results from ES not just raw data.
If you haven't seen it so far, I recommend the latest webinar [1] which features es-hadoop and provides a complete
picture of what es-hadooop is.
Cheers,
[1] http://www.elasticsearch.org/webinars/elasticsearch-and-apache-hadoop/
On 9/22/14 4:05 AM, Nelson Jeppesen wrote:
I'm trying to understand where `Elasticsearch for Hadoop` fits in the big data
landscape and why someone would use it.
1) If you wan't all the data in Haddop searchable, doesn't that mean everything
needs all the data duplicated in
Elasticsearch (via `es-hadoop`)?
2) Can you push all data from Elasticsearch into Hadoop whit es-hadoop, instead
of the reverse?
Here's my idea:
Short-term (1 week) real-time searchable (kibana) data is kept in
Elasticsearch
Long-term (1 year+) high-latency searchable (hbase,pig et al.) data kept
in Hadoop
At a high level, does this make sense?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
Costin
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/541FBE33.6010709%40gmail.com.
For more options, visit https://groups.google.com/d/optout.