Brian Hulette created BEAM-8884:
-----------------------------------

             Summary: Python MongoDBIO TypeError when splitting
                 Key: BEAM-8884
                 URL: https://issues.apache.org/jira/browse/BEAM-8884
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Brian Hulette


>From [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p1575350991134000]:

I am trying to run a pipeline (defined with the Python SDK) on Dataflow that 
uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb) it 
runs fine, when trying to run it with slightly larger datasets (70mb), I always 
get this error:


{code:}
TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
{code}


Stack trace see below. Running it on a local machine works just fine. I would 
highly appreciate any pointers what this could be.
I hope this is the right channel do address this.

{code:}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", 
line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
line 218, in execute
    self._split_task)
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
line 226, in _perform_source_split_considering_api_limits
    desired_bundle_size)
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
line 263, in _perform_source_split
    for split in source.split(desired_bundle_size):
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py", 
line 174, in split
    bundle_end = min(stop_position, split_key_id)
TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to