[
https://issues.apache.org/jira/browse/BEAM-6241?focusedWorklogId=182827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182827
]
ASF GitHub Bot logged work on BEAM-6241:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Jan/19 03:23
Start Date: 09/Jan/19 03:23
Worklog Time Spent: 10m
Work Description: sandboxws commented on issue #7293: [BEAM-6241] Added
limit and aggregates support to MongoDbIO
URL: https://github.com/apache/beam/pull/7293#issuecomment-452558297
@iemejia No need to apologize, I hope you had a great break. I spent almost
my entire break playing RDR2 (Red Dead Redemption 2) and BOTW ( Legend of
Zelda: Breath of the Wild) 😄
I like your suggestion of using a `SerializableFunction`, although if I
understood your suggestion correctly, I don't think from a user perspective
anything will change, aside from interacting more with the QueryBuilder.
Based on the snippet you added above, we will end up with something similar
to the following:
```java
MongoDbIO.read() // server, etc
.withQuery(
QueryBuilder.create()
.withLimit(10)
.withProjection("foo", "bar")
.build()
);
```
I honestly think that `MongoDbIO.Read` can be further enhanced to be more
developer friendly, but the challenge will remain the same. The challenge I'm
referring to here is that MongoDb allows two different, and distinct methods of
quering a database. There is the commonly used
[find](https://docs.mongodb.com/manual/reference/method/db.collection.find/),
and there is the more advanced
[aggregation](https://docs.mongodb.com/manual/aggregation/).
For most developers, especially the ones using streaming pipelines, the
following will suffice:
```java
MongoDbIO.read()
.withUri(mongodbUri)
.withDatabase(database)
.withCollection(collection)
.withDocumentIdStr("52cc8f6254c4327843000007")
```
However, when using batch processing for backfilling historical data to say
a data warehouse, MongoDB aggregation/pipelines is crucial.
Let me know what you think, in the meantime I'll address the remaining
comments mentioned by you and @kennknowles.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 182827)
Time Spent: 1h (was: 50m)
> MongoDbIO - Add Limit and Aggregates Support
> --------------------------------------------
>
> Key: BEAM-6241
> URL: https://issues.apache.org/jira/browse/BEAM-6241
> Project: Beam
> Issue Type: Improvement
> Components: io-java-mongodb
> Affects Versions: 2.9.0
> Reporter: Ahmed El.Hussaini
> Assignee: Ahmed El.Hussaini
> Priority: Major
> Labels: easyfix
> Fix For: 2.10.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> h2. Adds Support to Limit Results
> Â
> {code:java}
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withFilter("{\"scientist\":\"Einstein\"}")
> .withLimit(5));{code}
> h2. Adds Support to Use Aggregates
> Â
> {code:java}
> List<BsonDocument> aggregates = new ArrayList<BsonDocument>();
> aggregates.add(
> new BsonDocument(
> "$match",
> new BsonDocument("country", new BsonDocument("$eq", new
> BsonString("England")))));
> PCollection<Document> output =
> pipeline.apply(
> MongoDbIO.read()
> .withUri("mongodb://localhost:" + port)
> .withDatabase(DATABASE)
> .withCollection(COLLECTION)
> .withAggregate(aggregates));
> {code}
> Â
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)