Hi Val, I don't have any objections - please create a ticket and link it to the root ticket https://issues.apache.org/jira/browse/IGNITE-1371
Igor On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > 1. I still think we should do this. Loading nothing is very > counterintuitive and prevents a newbie user from quick start. For large > tables, when only part of the dataset is needed, user will explicitly > specify the query, of course. Do you have objections? If no, I will create > a ticket. > > 2. Got it, thanks. > > -Val > > On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <irud...@gmail.com> wrote: > >> Hi Val, >> >> 1) Well, it's not a problem to implement such default behavior, but there >> is one concern. In most cases, when you are using Cassandra as a persistent >> store you are going to store large amount of data, which is significantly >> bigger that amount of RAM in your Ignite cluster. In the such case it >> doesn't make sense to launch CQL query like "select * from my_table" cause: >> a) You still will not be able to keep all data from Cassandra table in >> Ignite cache >> b) All the data will be pulled from Cassandra table using only one >> thread - which is very slow >> >> 2) Unfortunately it's not possible in Cassandra. For JDBC you are >> splitting table into chunks of 512 rows each, using sub-queries and >> ordering by primary keys. Such kind of things are not supported in >> Cassandra. Probably the only way to load data from Cassandra table in >> parallel, is to load it from some specified partitions (in parallel for >> each partition). >> >> >> Igor Rudyak >> >> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> Thanks for response! >>> >>> 1. It's a bit inconsistent with other store implementations we have in >>> the product and actually I find this counterintuitive. Why don't we just >>> load all the data available in the table? Explicit query is useful when you >>> want to customize this and load subset of data based on some criteria. If >>> this is not possible for some reason, then I would at least throw an >>> exception in case query is not specified. >>> >>> 2. Is it possible to automatically split the data in bulks and load them >>> in parallel? We do this in the JDBC store, for example. >>> >>> -Val >>> >>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <irud...@gmail.com> wrote: >>> >>>> Hi Val, >>>> >>>> 1) If you'll call loadCache(null) it will do nothing. You need to >>>> provide at least one CQL query. >>>> >>>> 2) It depends. If you'll provide more than one CQL query, it will use >>>> separate thread for each of the queries (max number of threads limited to >>>> the number of CPU cores). But for each provided CQL query it will use only >>>> one thread to load all the data returned by the query. Also it will run the >>>> same CQL query from ALL Ignite nodes to load the same data, which is bad. >>>> That's because loadCache method will be executed on each Ignite node. As >>>> you see, it's not very efficient way to load data from Cassandra just by >>>> specifying CQL query. The ticket I created, is all about how to load data >>>> from one table (or from multiple tables as well) in parallel by >>>> partitioning it. Such a way each Ignite node will be responsible to load >>>> data from the specific partition range of Cassandra table, which is much >>>> more efficient. To support such kind of cache warm-up you should design >>>> your Cassandra table specific way - there should be some mapping from >>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to >>>> implement this. >>>> >>>> Igor Rudyak >>>> >>>> >>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < >>>> valentin.kuliche...@gmail.com> wrote: >>>> >>>>> Hi Igor, >>>>> >>>>> I've got couple of quick questions about the Cassandra store. >>>>> >>>>> 1. In [1] you suggested to provide an explicit query as a >>>>> parameter for loadCache() method, because otherwise user was always >>>>> getting >>>>> empty result. Is this a requirement to provide the query? What if I >>>>> just >>>>> call loadCache(null)? >>>>> 2. There is a ticket [2] about parallel load in Cassandra store. >>>>> Does it mean that currently it loads only in a single threaded >>>>> fashion? If >>>>> so, do you have any plans to implement this improvement? >>>>> >>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu >>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html >>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 >>>>> >>>>> Thanks, >>>>> Val >>>>> >>>> >>>> >>> >> >