Update: I'm back to working on this.
To allow a smoother migration, I'm planning on having apache-beam depend on
both googledatastore and google-cloud-datastore and having 2 Beam modules.
The newer client is a bit more limited in expressing queries (only ANDs for
composite filtering).
OTOH it supports transactions so we could add inserts of incomplete
entities.

Updated plan here:
https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit

On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath <chamik...@google.com
> > wrote:
>
>> Thanks Udi. Added some comments.
>>
>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> Udi thank you for the proposal and thank you for sharing it in plain
>>> email. My comments are below.
>>>
>>> Overall, this is a good plan to get us out of a tough situation with an
>>> old dependency.
>>>
>>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Hi,
>>>> Sadly upgrading googledatastore -> google-cloud-datastore is
>>>> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I wrote
>>>> a doc to summarize the plan:
>>>>
>>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing
>>>>
>>>> Contents pasted below:
>>>> Beam Python SDK: Datastore Client Upgrade
>>>>
>>>> eh...@google.com
>>>>
>>>> public, draft, 2018-10-16
>>>> Objective
>>>>
>>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore
>>>> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a
>>>> migration path to a new Datastore PTransform API.
>>>> Background
>>>>
>>>> Beam currently uses the googledatastore package to provide access to
>>>> Google Cloud Datastore, however that package doesn't seem to be getting
>>>> regular releases (last release in 2017-04
>>>> <https://pypi.org/project/googledatastore/>) and it doesn't officially
>>>> support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>.
>>>>
>>>> The current Beam API for Datastore queries exposes googledatastore
>>>> types, such as using a protobuf to define a query (wordcount example
>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
>>>> Conversely, google-cloud-datastore hides this implementation detail (query
>>>> API
>>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
>>>> Since Beam API has to change the data types it accepts, it forces users to
>>>> change their code. This makes the migration to google-cloud-datastore
>>>> non-trivial.
>>>> Proposal
>>>>
>>>> This proposal includes a period in which two Beam APIs are available to
>>>> access Datastore.
>>>>
>>>>
>>>>    -
>>>>
>>>>    Add a new PTransforms that use google-cloud-datastore and mark as
>>>>    deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>>>    DeleteFromDatastore).
>>>>    -
>>>>
>>>>    Implement apache_beam/io/datastore.py using google-cloud-datastore,
>>>>    taking care to not expose Datastore client internals.
>>>>    -
>>>>
>>>>    (optional) Remove googledatastore from GCP_REQUIREMENTS
>>>>    
>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
>>>>    package list, and add it to a separate list, e.g., pip install
>>>>    apache-beam[gcp,googledatastore].
>>>>
>>>>
>>> I would like to avoid defining new sets of extra packages. Assuming that
>>> these two packages are not incompatible together, we could keep them both
>>> in [gcp].
>>>
>>
>> I think we might need this since googleclouddatastore package (1) does
>> not seems to be getting upgraded (2) depends on older versions of packages
>> (for example, httplib2).
>>
>> This conflicts with more recent releases of other tools (for example,
>> gsutil).
>>
>
> This is fine, if it is the only viable option. But note that it is also a
> breaking change in the way people install beam in order to use old
> datastore APIs.
>
>
>>
>>
>>>
>>>
>>>>
>>>>    -
>>>>
>>>>    Remove googledatastore-based API from Beam after 2 releases.
>>>>
>>>>
>>> The removal needs to wait until next major version by default. Unless,
>>> we have a way of asking our users and ensuring that nobody is really using
>>> the existing API. Removing a current API in 2 releases (~3 months period)
>>> will hurt some users.
>>>
>> +1
>>
>>>
>>>
>>>
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to