(JIRA role added; reassigned.) On Thu, Jun 1, 2017 at 10:05 AM, Chamikara Jayalath <chamik...@apache.org> wrote:
> Thanks. I added some comments to the doc. > > Davor should be able to assign this JIRA to you. Also, Solomon who > implemented the Java BigTable connector might have more input here. > > - Cham > > > On Thu, Jun 1, 2017 at 2:19 AM Matthias Baetens < > matthias.baet...@datatonic.com> wrote: > >> Hi Cham, Stephan, >> >> Thanks a lot for the input, really useful to get started. >> >> We'll probably start with implementing the Source (looks the most >> straightforward). >> I made a working document >> <https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU- >> eMqWesgvtt231UoaPg/edit?usp=sharing> >> to >> organise and track our progress a bit, happy to discuss or receive >> feedback >> there as well. We made a JIRA issue >> <https://issues.apache.org/jira/browse/BEAM-2395> as well; should we get >> assigned to it? >> >> About writing the Sink: are there any examples of how this was done >> previously where we can get some inspiration from? I think it would be >> good >> to discuss this in more detail once we finish writing the Source. >> >> Matthias >> ᐧ >> >> On Tue, May 30, 2017 at 7:28 PM, Stephen Sisk <s...@google.com.invalid> >> wrote: >> >> > Hey Matthias, >> > >> > to add on to what Chamikara mentioned, we have lots of info in the >> generic >> > IO authoring guide [1], the Python IO authoring guide [2] and the >> > PTransform Style Guide[3]. The PTransform style guide doesn't sound >> like >> > it applies, but it has a lot of specific tips from lessons we've >> learned in >> > the past from I/O work. >> > >> > If you plan on contributing it back to the community, I'd also suggest >> > opening up a JIRA issue & updating the beam website (eg [4]) that you're >> > working on this (those steps are pretty trivial.) >> > >> > We've recently been trying out using branches when we add new I/Os since >> > the PRs tend to get bigger than we like for a since PR. >> > >> > Please feel free to email the dev mailing list if you have questions! We >> > are excited and happy to help out with thinking about design/etc... >> (eg, as >> > cham hinted at, should you use a Source vs. use regular ParDo >> transforms?) >> > >> > S >> > >> > [1] https://beam.apache.org/documentation/io/authoring-overview/ >> > [2] https://beam.apache.org/documentation/sdks/python-custom-io/ >> > [3] https://beam.apache.org/contribute/ptransform-style-guide/ >> > [4] https://github.com/apache/beam-site/pull/250 >> > >> > On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath < >> chamik...@apache.org> >> > wrote: >> > >> > > Thanks for offering to help. I would suggest to look into existing >> Java >> > > BigTableIO connector and currently available Python client library for >> > > Cloud BigTable to see how feasible it is to develop an efficient >> BigTable >> > > connector at this point. From Python SDK's perspective you can use >> > > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read >> > > PTransform with support for dynamic/static splitting. Sinks are >> usually >> > > developed as PTransforms (iobase.Sink interface is deprecated so I >> > suggest >> > > not to use that). I would be happy to review any PRs related to this. >> > > >> > > Thanks, >> > > Cham >> > > >> > > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens < >> > > matthias.baet...@datatonic.com> wrote: >> > > >> > > > Hey guys, >> > > > >> > > > We have been using Beam for quite a few months now, so we (my >> colleague >> > > > Robert & I) thought it might be cool to contribute a bit as well. >> > > > >> > > > The challenge we want to take up is writing the BigTableIO for the >> > Python >> > > > SDK (which is not yet in the works according to the website >> > > > < >> > > > >> > > https://github.com/apache/beam-site/blob/asf-site/src/ >> > documentation/io/built-in.md >> > > > >. >> > > > I have searched JIRA for the BigTableIO issue and did not find it, >> so I >> > > > suppose this is the first step we take. >> > > > >> > > > Any pointers or feedback more than welcome! >> > > > >> > > > Best, >> > > > >> > > > Matthias >> > > > >> > > >> > >> >> >> >> -- >> >> >> *Matthias Baetens* >> >> >> *datatonic | data power unleashed* >> >> office +44 203 668 3680 <+44%2020%203668%203680> | mobile +44 74 918 >> 20646 >> >> Level24 | 1 Canada Square | Canary Wharf | E14 5AB London >> >> >> We've been announced >> <https://blog.google/topics/google-cloud/investing-vibrant-google-cloud- >> ecosystem-new-programs-and-partnerships/> >> as >> one of the top global Google Cloud Machine Learning partners. >> >