Re: Cassandra store questions

Igor Rudyak Thu, 13 Oct 2016 17:06:55 -0700

Ok, thanks.

Igor


On Oct 13, 2016 4:37 PM, "Valentin Kulichenko" <
valentin.kuliche...@gmail.com> wrote:

> Here is the ticket: https://issues.apache.org/jira/browse/IGNITE-4075
>
> -Val
>
> On Wed, Oct 12, 2016 at 6:45 PM, Igor Rudyak <irud...@gmail.com> wrote:
>
>> Hi Val,
>>
>> I don't have any objections - please create a ticket and link it to the
>> root ticket https://issues.apache.org/jira/browse/IGNITE-1371
>>
>> Igor
>>
>> On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>>
>>> Hi Igor,
>>>
>>> 1. I still think we should do this. Loading nothing is very
>>> counterintuitive and prevents a newbie user from quick start. For large
>>> tables, when only part of the dataset is needed, user will explicitly
>>> specify the query, of course. Do you have objections? If no, I will create
>>> a ticket.
>>>
>>> 2. Got it, thanks.
>>>
>>> -Val
>>>
>>> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <irud...@gmail.com> wrote:
>>>
>>>> Hi Val,
>>>>
>>>> 1) Well, it's not a problem to implement such default behavior, but
>>>> there is one concern. In most cases, when you are using Cassandra as a
>>>> persistent store you are going to store large amount of data, which is
>>>> significantly bigger that amount of RAM in your Ignite cluster. In the such
>>>> case it doesn't make sense to launch CQL query like "select * from
>>>> my_table" cause:
>>>>    a) You still will not be able to keep all data from Cassandra table
>>>> in Ignite cache
>>>>    b) All the data will be pulled from Cassandra table using only one
>>>> thread - which is very slow
>>>>
>>>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>>>> splitting table into chunks of 512 rows each, using sub-queries and
>>>> ordering by primary keys. Such kind of things are not supported in
>>>> Cassandra. Probably the only way to load data from Cassandra table in
>>>> parallel, is to load it from some specified partitions (in parallel for
>>>> each partition).
>>>>
>>>>
>>>> Igor Rudyak
>>>>
>>>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>>>> valentin.kuliche...@gmail.com> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> Thanks for response!
>>>>>
>>>>> 1. It's a bit inconsistent with other store implementations we have in
>>>>> the product and actually I find this counterintuitive. Why don't we just
>>>>> load all the data available in the table? Explicit query is useful when 
>>>>> you
>>>>> want to customize this and load subset of data based on some criteria. If
>>>>> this is not possible for some reason, then I would at least throw an
>>>>> exception in case query is not specified.
>>>>>
>>>>> 2. Is it possible to automatically split the data in bulks and load
>>>>> them in parallel? We do this in the JDBC store, for example.
>>>>>
>>>>> -Val
>>>>>
>>>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <irud...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Val,
>>>>>>
>>>>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>>>>> provide at least one CQL query.
>>>>>>
>>>>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>>>>> separate thread for each of the queries (max number of threads limited to
>>>>>> the number of CPU cores). But for each provided CQL query it will use 
>>>>>> only
>>>>>> one thread to load all the data returned by the query. Also it will run 
>>>>>> the
>>>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>>>>> That's because loadCache method will be executed on each Ignite node. As
>>>>>> you see, it's not very efficient way to load data from Cassandra just by
>>>>>> specifying CQL query. The ticket I created, is all about how to load data
>>>>>> from one table (or from multiple tables as well) in parallel by
>>>>>> partitioning it. Such a way each Ignite node will be responsible to load
>>>>>> data from the specific partition range of Cassandra table, which is much
>>>>>> more efficient. To support such kind of cache warm-up you should design
>>>>>> your Cassandra table specific way - there should be some mapping from
>>>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>>>>> implement this.
>>>>>>
>>>>>> Igor Rudyak
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>>>>> valentin.kuliche...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> I've got couple of quick questions about the Cassandra store.
>>>>>>>
>>>>>>>    1. In [1] you suggested to provide an explicit query as a
>>>>>>>    parameter for loadCache() method, because otherwise user was always 
>>>>>>> getting
>>>>>>>    empty result. Is this a requirement to provide the query? What if I 
>>>>>>> just
>>>>>>>    call loadCache(null)?
>>>>>>>    2. There is a ticket [2] about parallel load in Cassandra store.
>>>>>>>    Does it mean that currently it loads only in a single threaded 
>>>>>>> fashion? If
>>>>>>>    so, do you have any plans to implement this improvement?
>>>>>>>
>>>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Val
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra store questions

Reply via email to