[ 
https://issues.apache.org/jira/browse/BEAM-7860?focusedWorklogId=290638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290638
 ]

ASF GitHub Bot logged work on BEAM-7860:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Aug/19 17:59
            Start Date: 07/Aug/19 17:59
    Worklog Time Spent: 10m 
      Work Description: udim commented on pull request #9240: [BEAM-7860] 
Python Datastore: fix key sort order
URL: https://github.com/apache/beam/pull/9240#discussion_r311687900
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/query_splitter.py
 ##########
 @@ -128,10 +131,57 @@ def _create_scatter_query(query, num_splits):
   return scatter_query
 
 
+class IdOrName(object):
+  """Represents an ID or name of a Datastore key,
+
+   Implements sort ordering: by ID, then by name, keys with IDs before those
+   with names.
+   """
+  def __init__(self, id_or_name):
+    self.id_or_name = id_or_name
+    if isinstance(id_or_name, (str, unicode)):
+      self.id = None
+      self.name = id_or_name
+    elif isinstance(id_or_name, (int, long)):
+      self.id = id_or_name
+      self.name = None
+    else:
+      raise TypeError('Unexpected type of id_or_name: %s' % id_or_name)
+
+  def __lt__(self, other):
+    if not isinstance(other, IdOrName):
+      return super(IdOrName, self).__lt__(other)
+
+    if self.id is not None:
+      if other.id is None:
+        return True
+      else:
+        return self.id < other.id
+
+    if other.id is not None:
+      return False
+
+    return self.name < other.name
+
+  def __eq__(self, other):
+    if not isinstance(other, IdOrName):
+      return super(IdOrName, self).__eq__(other)
+    return self.id == other.id and self.name == other.name
+
+  def __hash__(self):
+    return hash((self.id, self.other))
+
+
 def client_key_sort_key(client_key):
   """Key function for sorting lists of ``google.cloud.datastore.key.Key``."""
-  return [client_key.project, client_key.namespace or ''] + [
-      str(element) for element in client_key.flat_path]
+  sort_key = [client_key.project, client_key.namespace or '']
+  flat_path = list(client_key.flat_path)
 
 Review comment:
   Done
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290638)
    Time Spent: 1h 20m  (was: 1h 10m)

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> ---------------------------------------------------------------------
>
>                 Key: BEAM-7860
>                 URL: https://issues.apache.org/jira/browse/BEAM-7860
>             Project: Beam
>          Issue Type: Bug
>          Components: io-python-gcp
>    Affects Versions: 2.13.0
>         Environment: Python 2.7
> Python 3.7
>            Reporter: Niels Stender
>            Assignee: Udi Meiri
>            Priority: Blocker
>             Fix For: 2.15.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
>     keys = [
>         Key(['mixed', '10038260-iperm_eservice'], **config),
>         Key(['mixed', 4812224868188160], **config),
>         Key(['mixed', '99152975-pointshop'], **config)
>     ]
>     entities = map(lambda key: Entity(key=key), keys)
>     with beam.Pipeline() as p:
>         (p
>             | beam.Create(entities)
>             | datastoreio.WriteToDatastore(project=config['project'])
>         )
>     query = Query(kind='mixed', **config)
>     with beam.Pipeline() as p:
>         (p
>             | datastoreio.ReadFromDatastore(query=query, num_splits=4)
>             | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
>     )
>     items = open('tmp.txt').read().strip().split('\n')
>     assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to