vmarquez commented on a change in pull request #10546:
URL: https://github.com/apache/beam/pull/10546#discussion_r434318532
##########
File path:
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##########
@@ -326,7 +371,78 @@ private CassandraIO() {}
checkArgument(entity() != null, "withEntity() is required");
checkArgument(coder() != null, "withCoder() is required");
- return input.apply(org.apache.beam.sdk.io.Read.from(new
CassandraSource<>(this, null)));
+ ReadAll<T> readAll = CassandraIO.<T>readAll().withCoder(this.coder());
+
+ return input
+ .apply(Create.of(this))
+ .apply(ParDo.of(new SplitFn()))
+ .setCoder(SerializableCoder.of(new TypeDescriptor<Read<T>>() {}))
+ // .apply(Reshuffle.viaRandomKey())
+ .apply(readAll);
+ }
+
+ private class SplitFn extends DoFn<Read<T>, Read<T>> {
+
+ @ProcessElement
+ public void process(
+ @Element CassandraIO.Read<T> read, OutputReceiver<Read<T>>
outputReceiver) {
+
+ try (Cluster cluster =
+ getCluster(
+ read.hosts(),
+ read.port(),
+ read.username(),
+ read.password(),
+ read.localDc(),
+ read.consistencyLevel())) {
+ if (isMurmur3Partitioner(cluster)) {
+ LOG.info("Murmur3Partitioner detected, splitting");
+
+ List<BigInteger> tokens =
+ cluster.getMetadata().getTokenRanges().stream()
+ .map(tokenRange -> new
BigInteger(tokenRange.getEnd().getValue().toString()))
+ .collect(Collectors.toList());
+ Integer splitCount = cluster.getMetadata().getAllHosts().size();
+ if (read.minNumberOfSplits() != null &&
read.minNumberOfSplits().get() != null) {
+ splitCount = read.minNumberOfSplits().get();
+ }
+
+ SplitGenerator splitGenerator =
+ new SplitGenerator(cluster.getMetadata().getPartitioner());
+ splitGenerator
+ .generateSplits(splitCount, tokens)
+ .forEach(
+ rr ->
+ outputReceiver.output(
+ CassandraIO.<T>read()
Review comment:
I think we still need to create a new `Read<T>` because the
`SplitGenerator` returns a `List<List<RingRange>>`, so each of the outer list
will be a different Read, the inner List will be set to the RingRange.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]