Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075
-Val On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak <irud...@gmail.com> wrote: > Hi Val, > > I don't have any objections - please create a ticket and link it to the > root ticket https://issues.apache.org/jira/browse/IGNITE-1371 > > Igor > > On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> 1. I still think we should do this. Loading nothing is very >> counterintuitive and prevents a newbie user from quick start. For large >> tables, when only part of the dataset is needed, user will explicitly >> specify the query, of course. Do you have objections? If no, I will create >> a ticket. >> >> 2. Got it, thanks. >> >> -Val >> >> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <irud...@gmail.com> wrote: >> >>> Hi Val, >>> >>> 1) Well, it's not a problem to implement such default behavior, but >>> there is one concern. In most cases, when you are using Cassandra as a >>> persistent store you are going to store large amount of data, which is >>> significantly bigger that amount of RAM in your Ignite cluster. In the such >>> case it doesn't make sense to launch CQL query like "select * from >>> my_table" cause: >>> a) You still will not be able to keep all data from Cassandra table >>> in Ignite cache >>> b) All the data will be pulled from Cassandra table using only one >>> thread - which is very slow >>> >>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are >>> splitting table into chunks of 512 rows each, using sub-queries and >>> ordering by primary keys. Such kind of things are not supported in >>> Cassandra. Probably the only way to load data from Cassandra table in >>> parallel, is to load it from some specified partitions (in parallel for >>> each partition). >>> >>> >>> Igor Rudyak >>> >>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko < >>> valentin.kuliche...@gmail.com> wrote: >>> >>>> Hi Igor, >>>> >>>> Thanks for response! >>>> >>>> 1. It's a bit inconsistent with other store implementations we have in >>>> the product and actually I find this counterintuitive. Why don't we just >>>> load all the data available in the table? Explicit query is useful when you >>>> want to customize this and load subset of data based on some criteria. If >>>> this is not possible for some reason, then I would at least throw an >>>> exception in case query is not specified. >>>> >>>> 2. Is it possible to automatically split the data in bulks and load >>>> them in parallel? We do this in the JDBC store, for example. >>>> >>>> -Val >>>> >>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <irud...@gmail.com> wrote: >>>> >>>>> Hi Val, >>>>> >>>>> 1) If you'll call loadCache(null) it will do nothing. You need to >>>>> provide at least one CQL query. >>>>> >>>>> 2) It depends. If you'll provide more than one CQL query, it will use >>>>> separate thread for each of the queries (max number of threads limited to >>>>> the number of CPU cores). But for each provided CQL query it will use only >>>>> one thread to load all the data returned by the query. Also it will run >>>>> the >>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad. >>>>> That's because loadCache method will be executed on each Ignite node. As >>>>> you see, it's not very efficient way to load data from Cassandra just by >>>>> specifying CQL query. The ticket I created, is all about how to load data >>>>> from one table (or from multiple tables as well) in parallel by >>>>> partitioning it. Such a way each Ignite node will be responsible to load >>>>> data from the specific partition range of Cassandra table, which is much >>>>> more efficient. To support such kind of cache warm-up you should design >>>>> your Cassandra table specific way - there should be some mapping from >>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to >>>>> implement this. >>>>> >>>>> Igor Rudyak >>>>> >>>>> >>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko < >>>>> valentin.kuliche...@gmail.com> wrote: >>>>> >>>>>> Hi Igor, >>>>>> >>>>>> I've got couple of quick questions about the Cassandra store. >>>>>> >>>>>> 1. In [1] you suggested to provide an explicit query as a >>>>>> parameter for loadCache() method, because otherwise user was always >>>>>> getting >>>>>> empty result. Is this a requirement to provide the query? What if I >>>>>> just >>>>>> call loadCache(null)? >>>>>> 2. There is a ticket [2] about parallel load in Cassandra store. >>>>>> Does it mean that currently it loads only in a single threaded >>>>>> fashion? If >>>>>> so, do you have any plans to implement this improvement? >>>>>> >>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu >>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html >>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180 >>>>>> >>>>>> Thanks, >>>>>> Val >>>>>> >>>>> >>>>> >>>> >>> >> >