Hi, HadoopIO can be used to read from Hive. It doesn't provide writing to Hive. This new proposal for Hive connector includes both source and sink. It uses Hive native api.
Apache HCatalog provides way to read / write to hive without using mapreduce. HCatReader reads data from cluster, using basic storage abstraction of tables and rows. HCatWriter writes to cluster and a batching process will be used to write in bulk. Please refer to Apache documentation on HCatalog ReaderWriter https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter Solution: It will work like: pipeline.apply(HiveIO.read() .withMetastoreUri("uri") //mandatory .withTable("myTable") //mandatory .withDatabase("myDb") //optional, assumes default if none specified .withPartition(“partition”) //optional,should be specified if the table is partitioned pipeline.apply(HiveIO.write() .withMetastoreUri("uri") //mandatory .withTable("myTable") //mandatory .withDatabase("myDb") //optional, assumes default if none specified .withPartition(“partition”) //optional .withBatchSize(size)) //optional Please, let us know your comments and suggestions. Madhu Borkar
