[ 
https://issues.apache.org/jira/browse/BEAM-6241?focusedWorklogId=182827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182827
 ]

ASF GitHub Bot logged work on BEAM-6241:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jan/19 03:23
            Start Date: 09/Jan/19 03:23
    Worklog Time Spent: 10m 
      Work Description: sandboxws commented on issue #7293: [BEAM-6241] Added 
limit and aggregates support to MongoDbIO
URL: https://github.com/apache/beam/pull/7293#issuecomment-452558297
 
 
   @iemejia No need to apologize, I hope you had a great break. I spent almost 
my entire break playing RDR2 (Red Dead Redemption 2) and BOTW ( Legend of 
Zelda: Breath of the Wild) 😄 
   
   I like your suggestion of using a `SerializableFunction`, although if I 
understood your suggestion correctly, I don't think from a user perspective 
anything will change, aside from interacting more with the QueryBuilder.
   
   Based on the snippet you added above, we will end up with something similar 
to the following:
   
   ```java
   MongoDbIO.read() // server, etc
         .withQuery(
           QueryBuilder.create()
             .withLimit(10)
             .withProjection("foo", "bar")
             .build()
         );
   ```
   
   I honestly think that `MongoDbIO.Read` can be further enhanced to be more 
developer friendly, but the challenge will remain the same. The challenge I'm 
referring to here is that MongoDb allows two different, and distinct methods of 
quering a database. There is the commonly used 
[find](https://docs.mongodb.com/manual/reference/method/db.collection.find/), 
and there is the more advanced 
[aggregation](https://docs.mongodb.com/manual/aggregation/).
   
   For most developers, especially the ones using streaming pipelines, the 
following will suffice:
   
   ```java
   MongoDbIO.read()
     .withUri(mongodbUri)
     .withDatabase(database)
     .withCollection(collection)
     .withDocumentIdStr("52cc8f6254c4327843000007")
   ```
   
   However, when using batch processing for backfilling historical data to say 
a data warehouse, MongoDB aggregation/pipelines is crucial.
   
   Let me know what you think, in the meantime I'll address the remaining 
comments mentioned by you and @kennknowles.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 182827)
    Time Spent: 1h  (was: 50m)

> MongoDbIO - Add Limit and Aggregates Support
> --------------------------------------------
>
>                 Key: BEAM-6241
>                 URL: https://issues.apache.org/jira/browse/BEAM-6241
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-mongodb
>    Affects Versions: 2.9.0
>            Reporter: Ahmed El.Hussaini
>            Assignee: Ahmed El.Hussaini
>            Priority: Major
>              Labels: easyfix
>             Fix For: 2.10.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> h2. Adds Support to Limit Results
>  
> {code:java}
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withFilter("{\"scientist\":\"Einstein\"}")
> .withLimit(5));{code}
> h2. Adds Support to Use Aggregates
>  
> {code:java}
> List<BsonDocument> aggregates = new ArrayList<BsonDocument>();
>   aggregates.add(
>     new BsonDocument(
>       "$match",
>       new BsonDocument("country", new BsonDocument("$eq", new 
> BsonString("England")))));
> PCollection<Document> output =
>   pipeline.apply(
>     MongoDbIO.read()
>       .withUri("mongodb://localhost:" + port)
>       .withDatabase(DATABASE)
>       .withCollection(COLLECTION)
>       .withAggregate(aggregates));
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to