[ https://issues.apache.org/jira/browse/BEAM-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15533267#comment-15533267 ]
ASF GitHub Bot commented on BEAM-674: ------------------------------------- GitHub user dkulp opened a pull request: https://github.com/apache/incubator-beam/pull/1025 [BEAM-674] Gridfs Source refactoring Refactor of the GridFS based Source based on feedback from @jkff BoundedSource is now a source of ObjectID's and a separate DoFn is used to convert/parse the GridFSDBFile into usable chunks. Testcase for splitting added. Variables not needed by the Source are pulled out and stuck on the transform instead. Optimized the non-split case a bit by not querying all the ObjectIds up front. Optimize unit tests by setting up test data per class instead of per test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/incubator-beam gridfs-t2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1025.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1025 ---- commit 5aad971bcd1d32ba06cec9d4870e7aa9e9dc17f5 Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T02:44:37Z Split BoundedSource into a BoundedSource<ObjectID> and a DoFn<...> commit 2fc219cdd33e89d65d457dd3767bd378ff1111c0 Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T13:03:31Z Optimize reading for non-split case commit e58fc61868988cc40c325d913fca37b26e3db99c Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T13:18:17Z Use objectId timestamp commit ed73d77b21651d6ef1d8cf2892dc267794d52d10 Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T13:57:44Z Pull parser out of BoundedSource, add maxSkew commit 277667527cf0a23704b3ae3d05b2c8e2c2bcea3c Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T14:48:42Z Add test case for the split commit db30aabac4629ae167e4ede73de79257b4a93336 Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T15:00:44Z Don't need the generic on the Source and Reader commit 1cdb2ce716b7e020c5306494b414b5bb136abb24 Author: Daniel Kulp <dk...@apache.org> Date: 2016-09-29T16:29:51Z Rename maxSkew to allowedTimestampSkew to match other DoFn's ---- > Add GridFS support to MongoDB IO > -------------------------------- > > Key: BEAM-674 > URL: https://issues.apache.org/jira/browse/BEAM-674 > Project: Beam > Issue Type: Bug > Components: sdk-java-extensions > Reporter: Daniel Kulp > Assignee: Daniel Kulp > Fix For: 0.3.0-incubating > > > MongoDB has an "extension" called GridFS that allows storing of very large > "files" into the MongoDB database in a relatively efficient way. It would > be good to add a GridFS API based IO to allow retrieving the data for > processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)