Hi Vincent! Do you think you could add some code snippets / pseudocode as to what this looks like? Feel free to do it on email, gist, google doc, etc? Best -P.
On Thu, Oct 3, 2019 at 4:16 PM Vincent Marquez <vincent.marq...@gmail.com> wrote: > Currently the CassandraIO connector allows a user to specify a table, and > the CassandraSource object generates a list of queries based on token > ranges of the table, along with grouping them by the token ranges. > > I often need to run (generated, sometimes a million+) queries against a > subset of a table. Instead of providing a filter, it is easier and much > more performant to supply a collection of queries along with their tokens > to both partition and group by, instead of letting CassandraIO naively run > over the entire table or with a simple filter. > > I propose in addition to the current method of supplying a table and > filter, also allowing the user to pass in a collection of queries and > tokens. The current way CassandraSource breaks up the table could be > modified to build on top of the proposed implementation to reduce code > duplication as well. If this sounds like an acceptable alternative way of > using the CassandraIO connector, I don't mind giving it a shot with a pull > request. > > If there is a better way of doing this, I'm eager to hear and learn. > Thanks for reading! >