Thanks for taking the time to communicate about the updated plan. Really appreciate it.
Kenn On Tue, Mar 19, 2019 at 10:25 AM Udi Meiri <[email protected]> wrote: > Update: I'm back to working on this. > To allow a smoother migration, I'm planning on having apache-beam depend > on both googledatastore and google-cloud-datastore and having 2 Beam > modules. > The newer client is a bit more limited in expressing queries (only ANDs > for composite filtering). > OTOH it supports transactions so we could add inserts of incomplete > entities. > > Updated plan here: > https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit > > On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay <[email protected]> wrote: > >> >> >> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath < >> [email protected]> wrote: >> >>> Thanks Udi. Added some comments. >>> >>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay <[email protected]> wrote: >>> >>>> Udi thank you for the proposal and thank you for sharing it in plain >>>> email. My comments are below. >>>> >>>> Overall, this is a good plan to get us out of a tough situation with an >>>> old dependency. >>>> >>>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <[email protected]> wrote: >>>> >>>>> Hi, >>>>> Sadly upgrading googledatastore -> google-cloud-datastore is >>>>> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I >>>>> wrote a doc to summarize the plan: >>>>> >>>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing >>>>> >>>>> Contents pasted below: >>>>> Beam Python SDK: Datastore Client Upgrade >>>>> >>>>> [email protected] >>>>> >>>>> public, draft, 2018-10-16 >>>>> Objective >>>>> >>>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore >>>>> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a >>>>> migration path to a new Datastore PTransform API. >>>>> Background >>>>> >>>>> Beam currently uses the googledatastore package to provide access to >>>>> Google Cloud Datastore, however that package doesn't seem to be getting >>>>> regular releases (last release in 2017-04 >>>>> <https://pypi.org/project/googledatastore/>) and it doesn't >>>>> officially support Python 3 >>>>> <https://issues.apache.org/jira/browse/BEAM-4543>. >>>>> >>>>> The current Beam API for Datastore queries exposes googledatastore >>>>> types, such as using a protobuf to define a query (wordcount example >>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>). >>>>> Conversely, google-cloud-datastore hides this implementation detail (query >>>>> API >>>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>). >>>>> Since Beam API has to change the data types it accepts, it forces users to >>>>> change their code. This makes the migration to google-cloud-datastore >>>>> non-trivial. >>>>> Proposal >>>>> >>>>> This proposal includes a period in which two Beam APIs are available >>>>> to access Datastore. >>>>> >>>>> >>>>> - >>>>> >>>>> Add a new PTransforms that use google-cloud-datastore and mark as >>>>> deprecated the existing API (ReadFromDatastore, WriteToDatastore, >>>>> DeleteFromDatastore). >>>>> - >>>>> >>>>> Implement apache_beam/io/datastore.py using >>>>> google-cloud-datastore, taking care to not expose Datastore client >>>>> internals. >>>>> - >>>>> >>>>> (optional) Remove googledatastore from GCP_REQUIREMENTS >>>>> >>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139> >>>>> package list, and add it to a separate list, e.g., pip install >>>>> apache-beam[gcp,googledatastore]. >>>>> >>>>> >>>> I would like to avoid defining new sets of extra packages. Assuming >>>> that these two packages are not incompatible together, we could keep them >>>> both in [gcp]. >>>> >>> >>> I think we might need this since googleclouddatastore package (1) does >>> not seems to be getting upgraded (2) depends on older versions of packages >>> (for example, httplib2). >>> >>> This conflicts with more recent releases of other tools (for example, >>> gsutil). >>> >> >> This is fine, if it is the only viable option. But note that it is also a >> breaking change in the way people install beam in order to use old >> datastore APIs. >> >> >>> >>> >>>> >>>> >>>>> >>>>> - >>>>> >>>>> Remove googledatastore-based API from Beam after 2 releases. >>>>> >>>>> >>>> The removal needs to wait until next major version by default. Unless, >>>> we have a way of asking our users and ensuring that nobody is really using >>>> the existing API. Removing a current API in 2 releases (~3 months period) >>>> will hurt some users. >>>> >>> +1 >>> >>>> >>>> >>>> >>
