[ 
https://issues.apache.org/jira/browse/BEAM-3485?focusedWorklogId=90757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90757
 ]

ASF GitHub Bot logged work on BEAM-3485:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Apr/18 09:49
            Start Date: 13/Apr/18 09:49
    Worklog Time Spent: 10m 
      Work Description: adejanovski opened a new pull request #5124: 
[BEAM-3485] Fix split generation for Cassandra clusters
URL: https://github.com/apache/beam/pull/5124
 
 
   The existing code for generating splits was broken in several ways and 
couldn't ever generate splits.
   This commit uses the token range splitter that has been used for years in 
Reaper and will safely generate subranges as needed.
   It will respect data locality and avoid generating splits that would cover 
several token ranges as it would involve potentially many nodes in the cluster.
   The default load balancing policy has been switched to DCAware in all cases 
to avoid cross DC queries.
   
   DESCRIPTION HERE
   
   ------------------------
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
    - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
    - [ ] Write a pull request description that is detailed enough to 
understand:
      - [ ] What the pull request does
      - [ ] Why it does it
      - [ ] How it does it
      - [ ] Why this approach
    - [ ] Each commit in the pull request should have a meaningful subject line 
and body.
    - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 90757)
            Time Spent: 10m
    Remaining Estimate: 0h

> CassandraIO.read() splitting produces invalid queries
> -----------------------------------------------------
>
>                 Key: BEAM-3485
>                 URL: https://issues.apache.org/jira/browse/BEAM-3485
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-cassandra
>            Reporter: Eugene Kirpichov
>            Assignee: Alexey Romanenko
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to