There is no duplication per-se in HDFS. Hive tables are just 'views' of data - one sits unindexed, in raw format in HDFS the other one is indexed and analyzed in Elasticsearch.

You can't combine the two since they are completely different things - one is a file-system, the other one is a search and analytics engine.

On 09/01/2014 9:49 AM, Badal Mohapatra wrote:
Hi,

    To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data from 
another internal hive table doesn't it
duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead of 
having two tables with same data?

Kind Regards,
Badal

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
[email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ed08fd38-05e4-437a-a8e2-3295f2195e2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52EF730F.4060508%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to