[
https://issues.apache.org/jira/browse/BEAM-6324?focusedWorklogId=190068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-190068
]
ASF GitHub Bot logged work on BEAM-6324:
----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Jan/19 15:56
Start Date: 25/Jan/19 15:56
Worklog Time Spent: 10m
Work Description: echauchot commented on issue #7340: [BEAM-6324] -
Cassandra reader with query implemented
URL: https://github.com/apache/beam/pull/7340#issuecomment-457619621
I'm surprised that a withQuery() does not already exist on CassandraIO. I
guess it has to do with dataset size estimation for splitting (if you don't
know the size you cannot split). And I agree that avoiding full table scan is
very important. That being said, if you add a `withWhereClause()` method why
not add the complete withQuery() ? We then need to solve this problem: how can
we get the size of the input dataset without reading it all in advance? (take
a look at JDBCIO, I don't remember how it was solved) Once the size is known,
then splitting is done with the range query. Regarding using QueryBuilder or
String query, the former adds more security but we can leave the responsibility
to the user to write correct queries as it is done in other IOs, so I would
vote for the latter because it works out of the box without having
serialization issues.
Regarding use of non primary/cluster columns, we can accept raising the
error to the user but it needs to be tested that it actually raises an error in
that case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 190068)
Time Spent: 1h 50m (was: 1h 40m)
> CassandraIO.Read - Add the ability to provide a filter to the query
> -------------------------------------------------------------------
>
> Key: BEAM-6324
> URL: https://issues.apache.org/jira/browse/BEAM-6324
> Project: Beam
> Issue Type: Improvement
> Components: io-java-cassandra
> Affects Versions: 2.9.0
> Reporter: Shahar Frank
> Assignee: Shahar Frank
> Priority: Major
> Labels: performance, pull-request-available
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> CassandraIO.Read doesn't support using WHERE to filter the input at the
> source (In Cassandra) which might provide great performance boost.
> Already implemented by:
> https://github.com/apache/beam/pull/7340
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)