Update should not be a problem because no read is done, so no need to pull
the data out.

Is that row bigger than your memory capacity (Or HEAP size)? For dealing
with large heaps you can refer to this ticket: CASSANDRA-8150. It provides
some nice tips.

If someone else can share experience would be good.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Tel: 1649
www.pythian.com

On Wed, Feb 11, 2015 at 12:05 PM, Pavel Velikhov <pavel.velik...@gmail.com>
wrote:

> Hi Carlos,
>
>   I tried on a single node and a 4-node cluster. On the 4-node cluster I
> setup the tables with replication factor = 2.
> I usually iterate over a subset, but it can be about ~40% right now. Some
> of my column values could be quite big… I remember I was exporting to csv
> and I had to change the default csv max column length.
>
> If I just update, there are no problems, its reading and updating that
> kills everything (could it have something to do with the driver?)
>
> I’m using 2.0.8 release right now.
>
> I was trying to tweak memory sizes. If I give Cassandra too much memory
> (>8 or >16 GB) it dies much faster due to GC not being able to keep up. But
> it consistently dies on a specific row in single instance case…
>
> Is this enough info to point me somewhere?
>
> Thank you,
> Pavel
>
> On Feb 11, 2015, at 1:48 PM, Carlos Rolo <r...@pythian.com> wrote:
>
> Hello Pavel,
>
> What is the size of the Cluster (# of nodes)? And you need to iterate over
> the full 1TB every time you do the update? Or just parts of it?
>
> IMO information is short to make any kind of assessment of the problem you
> are having.
>
> I can suggest to try a 2.0.x (or 2.1.1) release to see if you get the same
> problem.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Tel: 1649
> www.pythian.com
>
> On Wed, Feb 11, 2015 at 11:22 AM, Pavel Velikhov <pavel.velik...@gmail.com
> > wrote:
>
>> Hi,
>>
>>   I’m using Cassandra to store NLP data, the dataset is not that huge
>> (about 1TB), but I need to iterate over it quite frequently, updating the
>> full dataset (each record, but not necessarily each column).
>>
>>   I’ve run into two problems (I’m using the latest Cassandra):
>>
>>   1. I was trying to copy from one Cassandra cluster to another via a
>> python driver, however the driver confused the two instances
>>   2. While trying to update the full dataset with a simple transformation
>> (again via python driver), single node and clustered Cassandra run out of
>> memory no matter what settings I try, even I put a lot of sleeps into the
>> mix. However simpler transformations (updating just one column, specially
>> when there is a lot of processing overhead) work just fine.
>>
>> I’m really concerned about #2, since we’re moving all heavy processing to
>> a Spark cluster and will expand it, and I would expect much heavier traffic
>> to/from Cassandra. Any hints, war stories, etc. very appreciated!
>>
>> Thank you,
>> Pavel Velikhov
>
>
>
> --
>
>
>
>
>
>

-- 


--



Reply via email to