[New Proposal] Hive connector using native api

Madhusudan Borkar Tue, 23 May 2017 15:12:33 -0700

Hi,
HadoopIO can be used to read from Hive. It doesn't provide writing to Hive.
This new proposal for Hive connector includes both source and sink. It uses
Hive native api.


Apache HCatalog provides way to read / write to hive without using
mapreduce. HCatReader reads data from cluster, using basic storage
abstraction of tables and rows. HCatWriter writes to cluster and a batching
process will be used to write in bulk. Please refer to Apache documentation
on HCatalog ReaderWriter
https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter

Solution:

It will work like:

pipeline.apply(HiveIO.read()

.withMetastoreUri("uri") //mandatory

.withTable("myTable") //mandatory

.withDatabase("myDb") //optional, assumes default if none specified

.withPartition(“partition”) //optional,should be specified if the table is
partitioned

pipeline.apply(HiveIO.write()

.withMetastoreUri("uri") //mandatory

.withTable("myTable") //mandatory

.withDatabase("myDb") //optional, assumes default if none specified

.withPartition(“partition”) //optional

.withBatchSize(size)) //optional

Please, let us know your comments and suggestions.

Madhu Borkar

[New Proposal] Hive connector using native api

Reply via email to