[ 
https://issues.apache.org/jira/browse/BEAM-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310996#comment-17310996
 ] 

Geoff Hutton edited comment on BEAM-7256 at 3/29/21, 9:24 PM:
--------------------------------------------------------------

Thanks for the reply [~sandboxws].  My task is to simply read the _entire_ 
collection into the pipeline.  Are you suggesting that I start multiple 
pipelines, each one reading a portion of the collection?

There is no timestamp in my documents, and in fact no key that would guarantee 
that the resulting subset would be below the size limit for certain.


was (Author: geoff hutton):
Thanks for the reply [~sandboxws].  The task I'm need to achieve is to simply 
read the _entire_ collection into the pipeline.  Are you suggesting that I 
start multiple pipelines, each one reading a portion of the collection?

There is no timestamp in my documents, and in fact no key that would guarantee 
that the resulting subset would be below the size limit for certain.

> Add support for allowDiskUse (AggregationOptions) in MongoDbIO 
> ---------------------------------------------------------------
>
>                 Key: BEAM-7256
>                 URL: https://issues.apache.org/jira/browse/BEAM-7256
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-mongodb
>    Affects Versions: 2.12.0
>            Reporter: Javier Cornejo
>            Priority: P3
>         Attachments: Screen Shot 2019-05-09 at 12.30.51.png
>
>
> When a read is executed over a collection that exceed the memory limit of 
> 104857600 an exception occurs. This is declared by mongodb and is possible to 
> control the error passing a AggregationOptions allowDiskUse true so mongo can 
> sort with disk usage. 
> This should be happen only when aggregations are added to read but now is 
> happening even without aggregation at all. 
> Please let me know how can help with this improvement /  bug.
>  
> !Screen Shot 2019-05-09 at 12.30.51.png!  
> {code:java}
> PCollection<KV<String, Document>> updateColls = p.apply("Reading Ops 
> Collection: " + key, MongoDbIO .read() .withUri(options.getMongoDBUri()) 
> .withDatabase("local") .withCollection("oplog.rs") .withBucketAuto(true) // 
> .withQueryFn( // FindQuery.create().withFilters( // Filters.and( // 
> Filters.gt("ts", ts.format(dtf)), // Filters.eq("ns", 
> options.getMongoDBDBName() + "" + key), // Filters.eq("op", "u") // ) // ) // 
> // AggregationQuery.create().withMongoDbPipeline(updatedDocsOplogAggregation) 
> // ) )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to