Repository: beam-site Updated Branches: refs/heads/asf-site 369b331db -> 07a32b382
Add HadoopInputFormatIO example to read from Hive's HCatalog Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8ff65fe7 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8ff65fe7 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8ff65fe7 Branch: refs/heads/asf-site Commit: 8ff65fe7198f43d3653c2bb539f22192459b83ec Parents: 369b331 Author: Seshadri Chakkravarthy <[email protected]> Authored: Tue May 23 16:15:26 2017 -0700 Committer: Ismaël MejÃa <[email protected]> Committed: Tue Jun 6 09:32:12 2017 +0200 ---------------------------------------------------------------------- src/documentation/io/built-in-hadoop.md | 31 ++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/8ff65fe7/src/documentation/io/built-in-hadoop.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/built-in-hadoop.md b/src/documentation/io/built-in-hadoop.md index 5c07717..722facb 100644 --- a/src/documentation/io/built-in-hadoop.md +++ b/src/documentation/io/built-in-hadoop.md @@ -195,3 +195,34 @@ PCollection<KV<Text, LinkedMapWritable>> elasticData = p.apply("read", ``` The `org.elasticsearch.hadoop.mr.EsInputFormat`'s `EsInputFormat` key class is `org.apache.hadoop.io.Text` `Text`, and its value class is `org.elasticsearch.hadoop.mr.LinkedMapWritable` `LinkedMapWritable`. Both key and value classes have Beam Coders. + +### HCatalog - HCatInputFormat + +To read data using HCatalog, use `org.apache.hive.hcatalog.mapreduce.HCatInputFormat`, which needs the following properties to be set: + +```java +Configuration hcatConf = new Configuration(); +hcatConf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, InputFormat.class); +hcatConf.setClass("key.class", LongWritable.class, Object.class); +hcatConf.setClass("value.class", HCatRecord.class, Object.class); +hcatConf.set("hive.metastore.uris", "thrift://metastore-host:port"); + +org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(hcatConf, "my_database", "my_table", "my_filter"); +``` + +```py + # The Beam SDK for Python does not support Hadoop InputFormat IO. +``` + +Call Read transform as follows: + +```java +PCollection<KV<Long, HCatRecord>> hcatData = + p.apply("read", + HadoopInputFormatIO.<Long, HCatRecord>read() + .withConfiguration(hcatConf); +``` + +```py + # The Beam SDK for Python does not support Hadoop InputFormat IO. +``` \ No newline at end of file
