Thanks. I added some comments to the doc. Davor should be able to assign this JIRA to you. Also, Solomon who implemented the Java BigTable connector might have more input here.
- Cham On Thu, Jun 1, 2017 at 2:19 AM Matthias Baetens < [email protected]> wrote: > Hi Cham, Stephan, > > Thanks a lot for the input, really useful to get started. > > We'll probably start with implementing the Source (looks the most > straightforward). > I made a working document > < > https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing > > > to > organise and track our progress a bit, happy to discuss or receive feedback > there as well. We made a JIRA issue > <https://issues.apache.org/jira/browse/BEAM-2395> as well; should we get > assigned to it? > > About writing the Sink: are there any examples of how this was done > previously where we can get some inspiration from? I think it would be good > to discuss this in more detail once we finish writing the Source. > > Matthias > ᐧ > > On Tue, May 30, 2017 at 7:28 PM, Stephen Sisk <[email protected]> > wrote: > > > Hey Matthias, > > > > to add on to what Chamikara mentioned, we have lots of info in the > generic > > IO authoring guide [1], the Python IO authoring guide [2] and the > > PTransform Style Guide[3]. The PTransform style guide doesn't sound like > > it applies, but it has a lot of specific tips from lessons we've learned > in > > the past from I/O work. > > > > If you plan on contributing it back to the community, I'd also suggest > > opening up a JIRA issue & updating the beam website (eg [4]) that you're > > working on this (those steps are pretty trivial.) > > > > We've recently been trying out using branches when we add new I/Os since > > the PRs tend to get bigger than we like for a since PR. > > > > Please feel free to email the dev mailing list if you have questions! We > > are excited and happy to help out with thinking about design/etc... (eg, > as > > cham hinted at, should you use a Source vs. use regular ParDo > transforms?) > > > > S > > > > [1] https://beam.apache.org/documentation/io/authoring-overview/ > > [2] https://beam.apache.org/documentation/sdks/python-custom-io/ > > [3] https://beam.apache.org/contribute/ptransform-style-guide/ > > [4] https://github.com/apache/beam-site/pull/250 > > > > On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <[email protected] > > > > wrote: > > > > > Thanks for offering to help. I would suggest to look into existing Java > > > BigTableIO connector and currently available Python client library for > > > Cloud BigTable to see how feasible it is to develop an efficient > BigTable > > > connector at this point. From Python SDK's perspective you can use > > > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read > > > PTransform with support for dynamic/static splitting. Sinks are usually > > > developed as PTransforms (iobase.Sink interface is deprecated so I > > suggest > > > not to use that). I would be happy to review any PRs related to this. > > > > > > Thanks, > > > Cham > > > > > > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens < > > > [email protected]> wrote: > > > > > > > Hey guys, > > > > > > > > We have been using Beam for quite a few months now, so we (my > colleague > > > > Robert & I) thought it might be cool to contribute a bit as well. > > > > > > > > The challenge we want to take up is writing the BigTableIO for the > > Python > > > > SDK (which is not yet in the works according to the website > > > > < > > > > > > > https://github.com/apache/beam-site/blob/asf-site/src/ > > documentation/io/built-in.md > > > > >. > > > > I have searched JIRA for the BigTableIO issue and did not find it, > so I > > > > suppose this is the first step we take. > > > > > > > > Any pointers or feedback more than welcome! > > > > > > > > Best, > > > > > > > > Matthias > > > > > > > > > > > > > -- > > > *Matthias Baetens* > > > *datatonic | data power unleashed* > > office +44 203 668 3680 <+44%2020%203668%203680> | mobile +44 74 918 > 20646 > > Level24 | 1 Canada Square | Canary Wharf | E14 5AB London > > > We've been announced > < > https://blog.google/topics/google-cloud/investing-vibrant-google-cloud-ecosystem-new-programs-and-partnerships/ > > > as > one of the top global Google Cloud Machine Learning partners. >
