Repository: beam-site
Updated Branches:
  refs/heads/asf-site 369b331db -> 07a32b382


Add HadoopInputFormatIO example to read from Hive's HCatalog


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8ff65fe7
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8ff65fe7
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8ff65fe7

Branch: refs/heads/asf-site
Commit: 8ff65fe7198f43d3653c2bb539f22192459b83ec
Parents: 369b331
Author: Seshadri Chakkravarthy <[email protected]>
Authored: Tue May 23 16:15:26 2017 -0700
Committer: Ismaël Mejía <[email protected]>
Committed: Tue Jun 6 09:32:12 2017 +0200

----------------------------------------------------------------------
 src/documentation/io/built-in-hadoop.md | 31 ++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/8ff65fe7/src/documentation/io/built-in-hadoop.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/built-in-hadoop.md 
b/src/documentation/io/built-in-hadoop.md
index 5c07717..722facb 100644
--- a/src/documentation/io/built-in-hadoop.md
+++ b/src/documentation/io/built-in-hadoop.md
@@ -195,3 +195,34 @@ PCollection<KV<Text, LinkedMapWritable>> elasticData = 
p.apply("read",
 ```
 
 The `org.elasticsearch.hadoop.mr.EsInputFormat`'s `EsInputFormat` key class is 
`org.apache.hadoop.io.Text` `Text`, and its value class is 
`org.elasticsearch.hadoop.mr.LinkedMapWritable` `LinkedMapWritable`. Both key 
and value classes have Beam Coders.
+
+### HCatalog - HCatInputFormat
+
+To read data using HCatalog, use 
`org.apache.hive.hcatalog.mapreduce.HCatInputFormat`, which needs the following 
properties to be set:
+
+```java
+Configuration hcatConf = new Configuration();
+hcatConf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, 
InputFormat.class);
+hcatConf.setClass("key.class", LongWritable.class, Object.class);
+hcatConf.setClass("value.class", HCatRecord.class, Object.class);
+hcatConf.set("hive.metastore.uris", "thrift://metastore-host:port");
+
+org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(hcatConf, 
"my_database", "my_table", "my_filter");
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+Call Read transform as follows:
+
+```java
+PCollection<KV<Long, HCatRecord>> hcatData =
+  p.apply("read",
+  HadoopInputFormatIO.<Long, HCatRecord>read()
+  .withConfiguration(hcatConf);
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
\ No newline at end of file

Reply via email to