Thanks Cham! Could you provide some more detail on your preference for developing a Python wrapper rather than implementing a source purely in Python?
If I look at the instructions for using the x-language Spanner connector, then using this - from the user's perspective - would involve installing a Java runtime. That's not terrible, but I fear that getting this to work with bazel might end up being more trouble than expected. (That has often happened here, and we have enough trouble with getting Python 3.9 and 3.10 to co-exist.) There are a few of us at our small start-up that have written MapReduces and similar in the past and are completely convinced by the Beam/Dataflow model. But many others have no previous experience and are skeptical, and see this new tool we're introducing as something that's more trouble than it's worth, and something they'd rather avoid - even when we see how lots of their use cases could be made much easier using Beam. I'm worried that every extra hoop to jump through will make it less likely to be widely used for us. Because of that, my bias would be towards having a Python connector rather than x-language, and I would find it really helpful to learn about why you both favor the x-language option. Thanks! -Lina On Tue, Jul 26, 2022 at 6:11 PM Chamikara Jayalath <chamik...@google.com> wrote: > > > > On Mon, Jul 25, 2022 at 12:53 PM Lina Mårtensson via dev > <dev@beam.apache.org> wrote: >> >> Hi dev, >> >> We're starting to incorporate BigTable in our stack and I've delighted >> my co-workers with how easy it was to create some BigTables with >> Beam... but there doesn't appear to be a reader for BigTable in >> Python. >> >> First off, is there a good reason why not/any reason why it would be >> difficult? > > > There's was a previous effort to implement a Python BT source but that was > not completed: > https://github.com/apache/beam/pull/11295#issuecomment-646378304 > >> >> >> I could write one, but before I start, I'd love some input to make it easier. >> >> It appears that there would be two options: either write one in >> Python, or try to set one up with x-language from Java which I see is >> done e.g. with the Spanner IO Connector. >> Any recommendation on which one to pick or potential pitfalls in either >> choice? >> >> If I write one in Python, what should I think about? >> It is not obvious to me how to achieve parallelization, so any tips >> here would be welcome. > > > I would strongly prefer developing a Python wrapper for the existing Java BT > source using Beam's Multi-language Pipelines framework over developing a new > Python source. > https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines > > Thanks, > Cham > > >> >> >> Thanks! >> -Lina