Re: Beam connector development for Hive as a data source

Madhusudan Borkar Thu, 23 Mar 2017 13:32:56 -0700

Hi Davor,
Thanks for your response. I am working with my team. We have some questions
where we need little bit of help.
We are creating a pipeline where the source is hdfs. But when the pipeline
is run it can not find the hadoop host.
Do we need to configure before we run this pipeline? I could not find any
doc on hdfs except hdfs uri.
This is our code.
HDFSFileSource<KV<LongWritable,Text>,LongWritable,Text> source =
HDFSFileSource.from("hdfs://hadoop-clust-0118-m:8020/tmp/puru/outputAllCols2039/part-m-00000",
TextInputFormat.class, LongWritable.class, Text.class);


ource.validate();
p.apply(Read.from(source));
p.run().waitUntilFinish();
Error is host not found
I would appreciate your help.
I also sent request to join the forum. I am waiting for response.

regards,
Madhu Borkar
(c) (408) 390-9518

On Mon, Feb 6, 2017 at 5:41 PM, Davor Bonaci <[email protected]> wrote:

> Hi Madhu,
> Welcome! I suggest subscribing to the dev@ mailing list and using the
> same email address when sending to the list, to avoid your email being
> caught in moderation.
>
> It would be great to have a connector for Apache Hive. Keep in mind that
> several folks have expressed interest in using and contributing this
> connector. As far as I know, nobody is *actively* working on it, so you
> should be good to go. Please use BEAM-1158 [1] to coordinate this work with
> any other interested contributor.
>
> Note that there are several different ways of connecting Beam and Hive.
> The simplest one is to write HiveIO that which would run a Hive query and
> process Hive's results in Beam. Another would be to use Beam within Hive to
> compute the results of a Hive query. Finally, one could possibly write a
> Hive-based DSL on top of a Beam SDK.
>
> All of these approaches are valid and somewhat orthogonal one to another.
> I'm assuming you are after the first one. If so, and if you plan to follow
> already established patterns in other connectors, you don't necessarily
> need a design document. Otherwise, please start with a design document. We
> have linked a template in the Contribution Guide [2, 3].
>
> Once again, welcome and let us know if we can help in any way!
>
> Davor
>
> [1] https://issues.apache.org/jira/browse/BEAM-1158
> [2] https://beam.apache.org/contribute/contribution-guide/
> [3] https://docs.google.com/document/d/1qYQPGtabN5-
> E4MjHsecqqC7PXvJtXvZukPfLXQ8rHJs
>
> On Mon, Feb 6, 2017 at 4:27 PM, Madhusudan Borkar <[email protected]>
> wrote:
>
>> Hello,
>>
>> I am Big Data Architect working at eTouch Systems. We are GCP partners. We
>> are planning to contribute to Beam by developing a connector for Apache
>> Hive as a data source.
>> I understand that before any development work begins, we need to submit
>> our
>> design to Beam community.  I would like to request you to please share a
>> "design template" document for the same.  We will submit our design
>> document, using the template.
>>
>>
>> Thank you.
>>
>> best regards
>> Madhu Borkar
>>
>
>

Re: Beam connector development for Hive as a data source

Reply via email to