GitHub user zapletal-martin opened a pull request:
https://github.com/apache/incubator-gearpump/pull/66
Cassandra integration
Cassandra database integration
- [X] CassandraSource
- [X] CassandraSink
- [X] CassandraStore
Reuses some Spark-Cassandra connector files and follows how that works. The
intent is to allow the connector to be reused when version for other processing
systems is available. The Source looks up token ranges in the desired table,
splits to independent sets of partitions and assigns those to available number
of source tasks, allowing very good parallelism. All fetches of data except the
first one are asynchronous. The Sink can be trivially parallelised by the user
where different writes are assigned to different tasks.
The Source scans a current table snapshot and does not currently honour
updates (so not a continuous stream). The source is not time replayable. There
are options how to handle both these, but must be properly thought through. The
test coverage is poor at the moment. but this first attempt will allow
iteration and continuous improvement of the code and adding features.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zapletal-martin/incubator-gearpump
cassandra-integration
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-gearpump/pull/66.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #66
----
commit 253ea9e233499c14796f50f9a0f072cf20062488
Author: martinzapletal <[email protected]>
Date: 2016-07-06T13:35:48Z
Gearpump Cassandra Integration
commit 1719f7bd3453be99d4bee608ea1e7ba42e68f681
Author: martinzapletal <[email protected]>
Date: 2016-07-11T03:53:19Z
Script
commit 0f3db5eeb9a83e9fd9c202387e55067ed9842570
Author: martinzapletal <[email protected]>
Date: 2016-07-16T21:06:06Z
Partitioner
commit 676afcc88fde4195f029a71970d8371a31bab134
Author: martinzapletal <[email protected]>
Date: 2016-07-17T01:10:40Z
Styling and working partitioning
commit 9e90868e6022c1d45f237756d77627dca1345b4e
Author: martinzapletal <[email protected]>
Date: 2016-07-17T02:34:55Z
Refactoring
commit fd8f3b4ae23069cd940a65bb715f1640b65fe708
Author: martinzapletal <[email protected]>
Date: 2016-07-17T02:43:49Z
Cleanup pass
commit 93142201793854c6e267885d272bb419fb6f8f96
Author: martinzapletal <[email protected]>
Date: 2016-07-17T02:59:00Z
Removing unnecessary code
commit e9c7084cae94c8bc08bc80331cac5b59a4382cec
Author: martinzapletal <[email protected]>
Date: 2016-07-17T14:15:15Z
Refactoring
commit 10370f318c52cc3ed5320d51b30c596f124e9a53
Author: martinzapletal <[email protected]>
Date: 2016-07-17T15:42:30Z
Apache 2.0 licence modification notices
commit b5d96bae09b5555c3ad59a0129ce9a768388f205
Author: martinzapletal <[email protected]>
Date: 2016-07-17T20:53:28Z
Refactoring
commit d855bde9426c8fe09aa794bec4ad5080b01ea828
Author: martinzapletal <[email protected]>
Date: 2016-07-24T14:53:15Z
Removing a TimeReplayableSource before we figure out an nefficient way to
track progress in all partitions
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---