[ 
https://issues.apache.org/jira/browse/BEAM-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437513#comment-16437513
 ] 

Alexey Romanenko commented on BEAM-3485:
----------------------------------------

[~adejanovski] 
 1. BEAM-3424 I agree with what you suggested as a split strategy. The only 
concern for me is, as it was original cause from StackOverflow question, that 
if user runs a pipeline from local machine and Cassandra instance is located in 
different network, then we can't estimate number of splits reliably. So, for 
this case, we perhaps could to provide an option to set number of splits 
manually, though, Beam doesn't greet additional tuning knobs that are not very 
necessary. Do you think it can be another solution for this? Default options?
 2. BEAM-3425 I'm just curious if _Long_ was not enough for that?

> CassandraIO.read() splitting produces invalid queries
> -----------------------------------------------------
>
>                 Key: BEAM-3485
>                 URL: https://issues.apache.org/jira/browse/BEAM-3485
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-cassandra
>            Reporter: Eugene Kirpichov
>            Assignee: Alexey Romanenko
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to