Re: Cassandra store questions

Valentin Kulichenko Wed, 12 Oct 2016 16:12:04 -0700

Hi Igor,

1. I still think we should do this. Loading nothing is very
counterintuitive and prevents a newbie user from quick start. For large
tables, when only part of the dataset is needed, user will explicitly
specify the query, of course. Do you have objections? If no, I will create
a ticket.


2. Got it, thanks.

-Val

On Mon, Oct 10, 2016 at 12:12 AM, Igor Rudyak <[email protected]> wrote:

> Hi Val,
>
> 1) Well, it's not a problem to implement such default behavior, but there
> is one concern. In most cases, when you are using Cassandra as a persistent
> store you are going to store large amount of data, which is significantly
> bigger that amount of RAM in your Ignite cluster. In the such case it
> doesn't make sense to launch CQL query like "select * from my_table" cause:
>    a) You still will not be able to keep all data from Cassandra table in
> Ignite cache
>    b) All the data will be pulled from Cassandra table using only one
> thread - which is very slow
>
> 2) Unfortunately it's not possible in Cassandra. For JDBC you are
> splitting table into chunks of 512 rows each, using sub-queries and
> ordering by primary keys. Such kind of things are not supported in
> Cassandra. Probably the only way to load data from Cassandra table in
> parallel, is to load it from some specified partitions (in parallel for
> each partition).
>
>
> Igor Rudyak
>
> On Fri, Oct 7, 2016 at 1:45 PM, Valentin Kulichenko <
> [email protected]> wrote:
>
>> Hi Igor,
>>
>> Thanks for response!
>>
>> 1. It's a bit inconsistent with other store implementations we have in
>> the product and actually I find this counterintuitive. Why don't we just
>> load all the data available in the table? Explicit query is useful when you
>> want to customize this and load subset of data based on some criteria. If
>> this is not possible for some reason, then I would at least throw an
>> exception in case query is not specified.
>>
>> 2. Is it possible to automatically split the data in bulks and load them
>> in parallel? We do this in the JDBC store, for example.
>>
>> -Val
>>
>> On Thu, Oct 6, 2016 at 11:00 PM, Igor Rudyak <[email protected]> wrote:
>>
>>> Hi Val,
>>>
>>> 1) If you'll call loadCache(null) it will do nothing. You need to
>>> provide at least one CQL query.
>>>
>>> 2) It depends. If you'll provide more than one CQL query, it will use
>>> separate thread for each of the queries (max number of threads limited to
>>> the number of CPU cores). But for each provided CQL query it will use only
>>> one thread to load all the data returned by the query. Also it will run the
>>> same CQL query from ALL Ignite nodes to load the same data, which is bad.
>>> That's because loadCache method will be executed on each Ignite node. As
>>> you see, it's not very efficient way to load data from Cassandra just by
>>> specifying CQL query. The ticket I created, is all about how to load data
>>> from one table (or from multiple tables as well) in parallel by
>>> partitioning it. Such a way each Ignite node will be responsible to load
>>> data from the specific partition range of Cassandra table, which is much
>>> more efficient. To support such kind of cache warm-up you should design
>>> your Cassandra table specific way - there should be some mapping from
>>> Ignite partition to the set of Cassandra partitions. Yes I have plans to
>>> implement this.
>>>
>>> Igor Rudyak
>>>
>>>
>>> On Thu, Oct 6, 2016 at 10:19 AM, Valentin Kulichenko <
>>> [email protected]> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> I've got couple of quick questions about the Cassandra store.
>>>>
>>>>    1. In [1] you suggested to provide an explicit query as a parameter
>>>>    for loadCache() method, because otherwise user was always getting empty
>>>>    result. Is this a requirement to provide the query? What if I just call
>>>>    loadCache(null)?
>>>>    2. There is a ticket [2] about parallel load in Cassandra store.
>>>>    Does it mean that currently it loads only in a single threaded fashion? 
>>>> If
>>>>    so, do you have any plans to implement this improvement?
>>>>
>>>> [1] http://apache-ignite-users.70518.x6.nabble.com/Cannot-qu
>>>> ery-on-a-cache-using-Cassandra-as-a-persistent-store-td7870.html
>>>> [2] https://gridgain.freshdesk.com/helpdesk/tickets/2180
>>>>
>>>> Thanks,
>>>> Val
>>>>
>>>
>>>
>>
>

Re: Cassandra store questions

Reply via email to