[ 
https://issues.apache.org/jira/browse/BEAM-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437089#comment-16437089
 ] 

Alexander Dejanovski commented on BEAM-3485:
--------------------------------------------

I've created a PR to fix the split generation : 
[https://github.com/apache/beam/pull/5124]

There are other issues with how connection are established, and more precisely 
how many since there should be a single Cluster object generated per physical 
cluster and JVM, while currently we're creating Cluster objects each time one 
is needed.

I'll create follow up tickets to handle this and expand the capabilities of 
both the reader (ability to add a custom where clause) and the writer (allow to 
use PreparedStatements instead of relying on the mapper).

> CassandraIO.read() splitting produces invalid queries
> -----------------------------------------------------
>
>                 Key: BEAM-3485
>                 URL: https://issues.apache.org/jira/browse/BEAM-3485
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-cassandra
>            Reporter: Eugene Kirpichov
>            Assignee: Alexey Romanenko
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to