Re: Cassandra store questions

Igor Rudyak Wed, 12 Oct 2016 18:46:02 -0700

Hi Val,

I don't have any objections - please create a ticket and link it to the
root ticket https://issues.apache.org/jira/browse/IGNITE-1371


Igor

On Wed, Oct 12, 2016 at 4:10 PM, Valentin Kulichenko <
[email protected]> wrote:

> Hi Igor,
>
> 1. I still think we should do this. Loading nothing is very
> counterintuitive and prevents a newbie user from quick start. For large
> tables, when only part of the dataset is needed, user will explicitly
> specify the query, of course. Do you have objections? If no, I will create
> a ticket.
>
> 2. Got it, thanks.
>
> -Val
>
> On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[email protected]> wrote:
>
>> Hi Val,
>>
>> 1) Well, it's not a problem to implement such default behavior, but there
>> is one concern. In most cases, when you are using Cassandra as a persistent
>> store you are going to store large amount of data, which is significantly
>> bigger that amount of RAM in your Ignite cluster. In the such case it
>> doesn't make sense to launch CQL query like "select * from my_table" cause:
>>    a) You still will not be able to keep all data from Cassandra table in
>> Ignite cache
>>    b) All the data will be pulled from Cassandra table using only one
>> thread - which is very slow
>>
>> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
>> splitting table into chunks of 512 rows each, using sub-queries and
>> ordering by primary keys. Such kind of things are not supported in
>> Cassandra. Probably the only way to load data from Cassandra table in
>> parallel, is to load it from some specified partitions (in parallel for
>> each partition).
>>
>>
>> Igor Rudyak
>>
>> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
>> [email protected]> wrote:
>>
>>> Hi Igor,
>>>
>>> Thanks for response!
>>>
>>> 1. It's a bit inconsistent with other store implementations we have in
>>> the product and actually I find this counterintuitive. Why don't we just
>>> load all the data available in the table? Explicit query is useful when you
>>> want to customize this and load subset of data based on some criteria. If
>>> this is not possible for some reason, then I would at least throw an
>>> exception in case query is not specified.
>>>
>>> 2. Is it possible to automatically split the data in bulks and load them
>>> in parallel? We do this in the JDBC store, for example.
>>>
>>> -Val
>>>
>>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[email protected]> wrote:
>>>
>>>> Hi Val,
>>>>
>>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>>> provide at least one CQL query.
>>>>
>>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>>> separate thread for each of the queries (max number of threads limited to
>>>> the number of CPU cores). But for each provided CQL query it will use only
>>>> one thread to load all the data returned by the query. Also it will run the
>>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>>> That's because loadCache method will be executed on each Ignite node. As
>>>> you see, it's not very efficient way to load data from Cassandra just by
>>>> specifying CQL query. The ticket I created, is all about how to load data
>>>> from one table (or from multiple tables as well) in parallel by
>>>> partitioning it. Such a way each Ignite node will be responsible to load
>>>> data from the specific partition range of Cassandra table, which is much
>>>> more efficient. To support such kind of cache warm-up you should design
>>>> your Cassandra table specific way - there should be some mapping from
>>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>>> implement this.
>>>>
>>>> Igor Rudyak
>>>>
>>>>
>>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> I've got couple of quick questions about the Cassandra store.
>>>>>
>>>>>    1. In [1] you suggested to provide an explicit query as a
>>>>>    parameter for loadCache() method, because otherwise user was always 
>>>>> getting
>>>>>    empty result. Is this a requirement to provide the query? What if I 
>>>>> just
>>>>>    call loadCache(null)?
>>>>>    2. There is a ticket [2] about parallel load in Cassandra store.
>>>>>    Does it mean that currently it loads only in a single threaded 
>>>>> fashion? If
>>>>>    so, do you have any plans to implement this improvement?
>>>>>
>>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>>
>>>>> Thanks,
>>>>> Val
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra store questions

Reply via email to