[ 
https://issues.apache.org/jira/browse/BEAM-11266?focusedWorklogId=515304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-515304
 ]

ASF GitHub Bot logged work on BEAM-11266:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Nov/20 19:51
            Start Date: 22/Nov/20 19:51
    Worklog Time Spent: 10m 
      Work Description: nikie edited a comment on pull request #13350:
URL: https://github.com/apache/beam/pull/13350#issuecomment-731834985


   @y1chi 
   I have implemented your suggested changes and more (see the last commit 
message for more details):
   - auto-bucketing respects not only _id range, but also custom filter for 
both docs counting and the aggregation (this might feel like an overhead, but 
should provide more precise splits);
   - improved unit and integration tests.
   
   Java's `MongoDBIO` works differently:
   - there is a `numSplit` option which controls the number of auto buckets (10 
by default) and the number of splitVector buckets if set;
   - does not estimate desired bundle size for auto bucketing, only for 
splitVector mode if `numSplit` is not provided and recalculates bundle size 
based on `numSplit` if it is provided;
   - does not use custom filter for auto bucketing, only filters the actual 
reads as per the split buckets;
   - does not have start/stop logic for dynamic rebalancing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 515304)
    Time Spent: 2h 10m  (was: 2h)

> Cannot use Python MongoDB connector with Atlas MongoDB
> ------------------------------------------------------
>
>                 Key: BEAM-11266
>                 URL: https://issues.apache.org/jira/browse/BEAM-11266
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-mongodb
>    Affects Versions: 2.25.0
>         Environment: Google Cloud Dataflow
>            Reporter: Eugene Nikolaiev
>            Assignee: Yichi Zhang
>            Priority: P2
>              Labels: mongodb, python
>             Fix For: 2.27.0
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Cannot use the Python MongoDB connector with a managed Atlas instance. The 
> current implementations makes use of splitVector which is a high-privilege 
> function that cannot be assigned to any user in Atlas. Getting error:
> {code:java}
> pymongo.errors.OperationFailure: not authorized on properties to execute 
> command
>  { splitVector: "properties.properties", keyPattern: { _id: 1 },
> ...{code}
> BEAM-4567 addressed the same issue in Java connector.
> Proposed solution for Python is to add {{bucket_auto}} option for the 
> connector which would configure it to use {{@bucketAuto}} MongoDB aggregation 
> instead of {{splitVector}} command:
> {code:java}
> pipeline | ReadFromMongoDB(uri='mongodb+srv://user:[email protected]',
>                            db='testdb',
>                            coll='input',
>                            bucket_auto=True)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to