[ 
https://issues.apache.org/jira/browse/BEAM-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15533267#comment-15533267
 ] 

ASF GitHub Bot commented on BEAM-674:
-------------------------------------

GitHub user dkulp opened a pull request:

    https://github.com/apache/incubator-beam/pull/1025

    [BEAM-674] Gridfs Source refactoring

    Refactor of the GridFS based Source based on feedback from @jkff 
    
    BoundedSource is now a source of ObjectID's and a separate DoFn is used to 
convert/parse the GridFSDBFile into usable chunks.   
    
    Testcase for splitting added.
    
    Variables not needed by the Source are pulled out and stuck on the 
transform instead.
    
    Optimized the non-split case a bit by not querying all the ObjectIds up 
front.  
    
    Optimize unit tests by setting up test data per class instead of per test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dkulp/incubator-beam gridfs-t2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1025
    
----
commit 5aad971bcd1d32ba06cec9d4870e7aa9e9dc17f5
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T02:44:37Z

    Split BoundedSource into a BoundedSource<ObjectID> and a DoFn<...>

commit 2fc219cdd33e89d65d457dd3767bd378ff1111c0
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T13:03:31Z

    Optimize reading for non-split case

commit e58fc61868988cc40c325d913fca37b26e3db99c
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T13:18:17Z

    Use objectId timestamp

commit ed73d77b21651d6ef1d8cf2892dc267794d52d10
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T13:57:44Z

    Pull parser out of BoundedSource, add maxSkew

commit 277667527cf0a23704b3ae3d05b2c8e2c2bcea3c
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T14:48:42Z

    Add test case for the split

commit db30aabac4629ae167e4ede73de79257b4a93336
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T15:00:44Z

    Don't need the generic on the Source and Reader

commit 1cdb2ce716b7e020c5306494b414b5bb136abb24
Author: Daniel Kulp <dk...@apache.org>
Date:   2016-09-29T16:29:51Z

    Rename maxSkew to allowedTimestampSkew to match other DoFn's

----


> Add GridFS support to MongoDB IO
> --------------------------------
>
>                 Key: BEAM-674
>                 URL: https://issues.apache.org/jira/browse/BEAM-674
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Daniel Kulp
>            Assignee: Daniel Kulp
>             Fix For: 0.3.0-incubating
>
>
> MongoDB has an "extension" called GridFS that allows storing of very large 
> "files" into the MongoDB database in a relatively efficient way.   It would 
> be good to add a GridFS API based IO to allow retrieving the data for 
> processing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to