Hi Chris,
Well, that seemed like a good idea at first, I would like to read from
Cassandra and post to KAFKA
But the KAFKA Connector Cassandra Source, requires that the table has a
time-series order, and all my tables does not
So thanx for the tip, but it did not work ☹
-Tobias
From: Chris
Maybe
https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/
On Wed, Apr 26, 2017 at 2:49 PM, Tobias Eriksson <
tobias.eriks...@qvantel.com> wrote:
> Hi
>
> I would like to make a dump of the database, in JSON format, to KAFKA
>
> The database contains lots of data,
You can run multiple applications in parallel in Standalone mode - you just
need to configure spark to allocate resources between your jobs the way you
want (by default it assigns all resources to the first application you run,
so they won't be freed up until it has finished).
You can use Spark's
Well, I have been working some with Spark and the biggest hurdle is that Spark
does not allow me to run multiple jobs in parallel
i.e. at the point of starting the job to taking the table of “Individuals” I
will have to wait until all that processing is done before I can start an
additional one
You could probably save yourself a lot of hassle by just writing a Spark
job that scans through the entire table, converts each row to JSON and
dumps the output into a Kafka topic. It should be fairly straightforward to
implement.
Spark will manage the partitioning of "Producer" processes for you