[
https://issues.apache.org/jira/browse/BEAM-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987422#comment-16987422
]
Corvin Deboeser commented on BEAM-8884:
---------------------------------------
Hi [~yichi], I tried with pymongo 3.8.0 and 3.9.0 so far, for both it was the
same structure. Did not check any other versions. Thanks for the help and the
module in the first place - greatly appreciated :)
> Python MongoDBIO TypeError when splitting
> -----------------------------------------
>
> Key: BEAM-8884
> URL: https://issues.apache.org/jira/browse/BEAM-8884
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Yichi Zhang
> Priority: Major
>
> From [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p1575350991134000]:
> I am trying to run a pipeline (defined with the Python SDK) on Dataflow that
> uses beam.io.ReadFromMongoDB. When dealing with very small datasets (<10mb)
> it runs fine, when trying to run it with slightly larger datasets (70mb), I
> always get this error:
> {code:}
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}
> Stack trace see below. Running it on a local machine works just fine. I would
> highly appreciate any pointers what this could be.
> I hope this is the right channel do address this.
> {code:}
> Traceback (most recent call last):
> File
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line
> 649, in do_work
> work_executor.execute()
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
> line 218, in execute
> self._split_task)
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
> line 226, in _perform_source_split_considering_api_limits
> desired_bundle_size)
> File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
> line 263, in _perform_source_split
> for split in source.split(desired_bundle_size):
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/mongodbio.py",
> line 174, in split
> bundle_end = min(stop_position, split_key_id)
> TypeError: '<' not supported between instances of 'dict' and 'ObjectId'
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)