GitHub user dkulp opened a pull request:
https://github.com/apache/incubator-beam/pull/1025
[BEAM-674] Gridfs Source refactoring
Refactor of the GridFS based Source based on feedback from @jkff
BoundedSource is now a source of ObjectID's and a separate DoFn is used to
convert/parse the GridFSDBFile into usable chunks.
Testcase for splitting added.
Variables not needed by the Source are pulled out and stuck on the
transform instead.
Optimized the non-split case a bit by not querying all the ObjectIds up
front.
Optimize unit tests by setting up test data per class instead of per test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dkulp/incubator-beam gridfs-t2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/1025.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1025
----
commit 5aad971bcd1d32ba06cec9d4870e7aa9e9dc17f5
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T02:44:37Z
Split BoundedSource into a BoundedSource<ObjectID> and a DoFn<...>
commit 2fc219cdd33e89d65d457dd3767bd378ff1111c0
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T13:03:31Z
Optimize reading for non-split case
commit e58fc61868988cc40c325d913fca37b26e3db99c
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T13:18:17Z
Use objectId timestamp
commit ed73d77b21651d6ef1d8cf2892dc267794d52d10
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T13:57:44Z
Pull parser out of BoundedSource, add maxSkew
commit 277667527cf0a23704b3ae3d05b2c8e2c2bcea3c
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T14:48:42Z
Add test case for the split
commit db30aabac4629ae167e4ede73de79257b4a93336
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T15:00:44Z
Don't need the generic on the Source and Reader
commit 1cdb2ce716b7e020c5306494b414b5bb136abb24
Author: Daniel Kulp <[email protected]>
Date: 2016-09-29T16:29:51Z
Rename maxSkew to allowedTimestampSkew to match other DoFn's
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---