Re: Python SDK: BigTableIO

Stephen Sisk Tue, 30 May 2017 11:29:12 -0700

Hey Matthias,

to add on to what Chamikara mentioned, we have lots of info in the generic
IO authoring guide [1], the Python IO authoring guide [2] and the
PTransform Style Guide[3].  The PTransform style guide doesn't sound like
it applies, but it has a lot of specific tips from lessons we've learned in
the past from I/O work.


If you plan on contributing it back to the community, I'd also suggest
opening up a JIRA issue & updating the beam website (eg [4]) that you're
working on this (those steps are pretty trivial.)

We've recently been trying out using branches when we add new I/Os since
the PRs tend to get bigger than we like for a since PR.

Please feel free to email the dev mailing list if you have questions! We
are excited and happy to help out with thinking about design/etc... (eg, as
cham hinted at, should you use a Source vs. use regular ParDo transforms?)

S

[1] https://beam.apache.org/documentation/io/authoring-overview/
[2] https://beam.apache.org/documentation/sdks/python-custom-io/
[3] https://beam.apache.org/contribute/ptransform-style-guide/
[4] https://github.com/apache/beam-site/pull/250

On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <[email protected]>
wrote:

> Thanks for offering to help. I would suggest to look into existing Java
> BigTableIO connector and currently available Python client library for
> Cloud BigTable to see how feasible it is to develop an efficient BigTable
> connector at this point. From Python SDK's perspective you can use
> iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read
> PTransform with support for dynamic/static splitting. Sinks are usually
> developed as PTransforms (iobase.Sink interface is deprecated so I suggest
> not to use that). I would be happy to review any PRs related to this.
>
> Thanks,
> Cham
>
> On Sun, May 28, 2017 at 2:30 AM Matthias Baetens <
> [email protected]> wrote:
>
> > Hey guys,
> >
> > We have been using Beam for quite a few months now, so we (my colleague
> > Robert & I) thought it might be cool to contribute a bit as well.
> >
> > The challenge we want to take up is writing the BigTableIO for the Python
> > SDK (which is not yet in the works according to the website
> > <
> >
> https://github.com/apache/beam-site/blob/asf-site/src/documentation/io/built-in.md
> > >.
> > I have searched JIRA for the BigTableIO issue and did not find it, so I
> > suppose this is the first step we take.
> >
> > Any pointers or feedback more than welcome!
> >
> > Best,
> >
> > Matthias
> >
>

Re: Python SDK: BigTableIO

Reply via email to