Update: I'm back to working on this. To allow a smoother migration, I'm planning on having apache-beam depend on both googledatastore and google-cloud-datastore and having 2 Beam modules. The newer client is a bit more limited in expressing queries (only ANDs for composite filtering). OTOH it supports transactions so we could add inserts of incomplete entities.
Updated plan here: https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay <al...@google.com> wrote: > > > On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath <chamik...@google.com > > wrote: > >> Thanks Udi. Added some comments. >> >> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay <al...@google.com> wrote: >> >>> Udi thank you for the proposal and thank you for sharing it in plain >>> email. My comments are below. >>> >>> Overall, this is a good plan to get us out of a tough situation with an >>> old dependency. >>> >>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <eh...@google.com> wrote: >>> >>>> Hi, >>>> Sadly upgrading googledatastore -> google-cloud-datastore is >>>> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I wrote >>>> a doc to summarize the plan: >>>> >>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing >>>> >>>> Contents pasted below: >>>> Beam Python SDK: Datastore Client Upgrade >>>> >>>> eh...@google.com >>>> >>>> public, draft, 2018-10-16 >>>> Objective >>>> >>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore >>>> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a >>>> migration path to a new Datastore PTransform API. >>>> Background >>>> >>>> Beam currently uses the googledatastore package to provide access to >>>> Google Cloud Datastore, however that package doesn't seem to be getting >>>> regular releases (last release in 2017-04 >>>> <https://pypi.org/project/googledatastore/>) and it doesn't officially >>>> support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>. >>>> >>>> The current Beam API for Datastore queries exposes googledatastore >>>> types, such as using a protobuf to define a query (wordcount example >>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>). >>>> Conversely, google-cloud-datastore hides this implementation detail (query >>>> API >>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>). >>>> Since Beam API has to change the data types it accepts, it forces users to >>>> change their code. This makes the migration to google-cloud-datastore >>>> non-trivial. >>>> Proposal >>>> >>>> This proposal includes a period in which two Beam APIs are available to >>>> access Datastore. >>>> >>>> >>>> - >>>> >>>> Add a new PTransforms that use google-cloud-datastore and mark as >>>> deprecated the existing API (ReadFromDatastore, WriteToDatastore, >>>> DeleteFromDatastore). >>>> - >>>> >>>> Implement apache_beam/io/datastore.py using google-cloud-datastore, >>>> taking care to not expose Datastore client internals. >>>> - >>>> >>>> (optional) Remove googledatastore from GCP_REQUIREMENTS >>>> >>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139> >>>> package list, and add it to a separate list, e.g., pip install >>>> apache-beam[gcp,googledatastore]. >>>> >>>> >>> I would like to avoid defining new sets of extra packages. Assuming that >>> these two packages are not incompatible together, we could keep them both >>> in [gcp]. >>> >> >> I think we might need this since googleclouddatastore package (1) does >> not seems to be getting upgraded (2) depends on older versions of packages >> (for example, httplib2). >> >> This conflicts with more recent releases of other tools (for example, >> gsutil). >> > > This is fine, if it is the only viable option. But note that it is also a > breaking change in the way people install beam in order to use old > datastore APIs. > > >> >> >>> >>> >>>> >>>> - >>>> >>>> Remove googledatastore-based API from Beam after 2 releases. >>>> >>>> >>> The removal needs to wait until next major version by default. Unless, >>> we have a way of asking our users and ensuring that nobody is really using >>> the existing API. Removing a current API in 2 releases (~3 months period) >>> will hurt some users. >>> >> +1 >> >>> >>> >>> >
smime.p7s
Description: S/MIME Cryptographic Signature