There is no duplication per-se in HDFS. Hive tables are just 'views' of data - one sits unindexed, in raw format in HDFS
the other one is indexed and analyzed in Elasticsearch.
You can't combine the two since they are completely different things - one is a file-system, the other one is a search
and analytics engine.
On 09/01/2014 9:49 AM, Badal Mohapatra wrote:
Hi,
To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data from
another internal hive table doesn't it
duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead of
having two tables with same data?
Kind Regards,
Badal
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ed08fd38-05e4-437a-a8e2-3295f2195e2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/52EF730F.4060508%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.