[ 
https://issues.apache.org/jira/browse/BEAM-7860?focusedWorklogId=290639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290639
 ]

ASF GitHub Bot logged work on BEAM-7860:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Aug/19 18:00
            Start Date: 07/Aug/19 18:00
    Worklog Time Spent: 10m 
      Work Description: udim commented on issue #9240: [BEAM-7860] Python 
Datastore: fix key sort order
URL: https://github.com/apache/beam/pull/9240#issuecomment-519205882
 
 
   run python 2 postcommit
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290639)
    Time Spent: 1.5h  (was: 1h 20m)

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> ---------------------------------------------------------------------
>
>                 Key: BEAM-7860
>                 URL: https://issues.apache.org/jira/browse/BEAM-7860
>             Project: Beam
>          Issue Type: Bug
>          Components: io-python-gcp
>    Affects Versions: 2.13.0
>         Environment: Python 2.7
> Python 3.7
>            Reporter: Niels Stender
>            Assignee: Udi Meiri
>            Priority: Blocker
>             Fix For: 2.15.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
>     keys = [
>         Key(['mixed', '10038260-iperm_eservice'], **config),
>         Key(['mixed', 4812224868188160], **config),
>         Key(['mixed', '99152975-pointshop'], **config)
>     ]
>     entities = map(lambda key: Entity(key=key), keys)
>     with beam.Pipeline() as p:
>         (p
>             | beam.Create(entities)
>             | datastoreio.WriteToDatastore(project=config['project'])
>         )
>     query = Query(kind='mixed', **config)
>     with beam.Pipeline() as p:
>         (p
>             | datastoreio.ReadFromDatastore(query=query, num_splits=4)
>             | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
>     )
>     items = open('tmp.txt').read().strip().split('\n')
>     assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to