possible problem with multi-node cluster

2014-01-16 Thread Amalrik Maia
Hi guys,
i have some issues with an application on a cassandra multinode cluster.
I need help to understand what is going on.
 
here is the stack trace of my application.
2014-01-16 12:33:14,176 - root - ERROR - 

2014-01-16 12:33:14,176 - root - ERROR - Task finished processing with 
exception, message will be kept in the queue
2014-01-16 12:33:14,180 - pycassa.pool - INFO - Connection 42979792 
(cassandra.***.com:9160) in pool 42966096 failed: 
2014-01-16 12:33:14,182 - pycassa.pool - INFO - Connection 42979408 
(cassandra.***.com:9160) in pool 42966096 failed: 
2014-01-16 12:33:14,182 - pycassa.pool - INFO - Connection 42966864 
(cassandra.***.com:9160) in pool 42966096 failed: 
2014-01-16 12:33:14,183 - root - ERROR - Exception in user code:
2014-01-16 12:33:14,183 - root - ERROR - 

2014-01-16 12:33:14,183 - root - ERROR - (class 
'pycassa.pool.MaximumRetryException', MaximumRetryException('Retried 1 times. 
Last failure was EOFError: ',), traceback object at 0x28fe3b0)
2014-01-16 12:33:14,183 - root - ERROR - Traceback (most recent call last):
  File acme.py, line 20, in task_run
run_method_result = run_method(message.get_body())
  File /opt/acme/acme/acme/apps/stats_updater/stats_updater.py, line 141, in 
acme_run
StatsUpdater.get().run(params)
  File /opt/acme/acme/acme/apps/stats_updater/stats_updater.py, line 94, in 
run
result = StatisticsService.get().increment_counters(cookie, counter_type, 
obj_id, pdate, 1, tag_name)
  File 
/opt/acme/local/lib/python2.7/site-packages/s1audservice/statistics_service.py,
 line 41, in increment_counters
self.do_increment_counter(False, FrequencyCriteria.HOUR, counter_type_tag, 
obj_id, date, value)
  File 
/opt/acme/local/lib/python2.7/site-packages/s1audservice/statistics_service.py,
 line 389, in do_increment_counter
return self.increment_counter(counter_type_tag, f_date, nosql_id, 
cf_to_update, value_to_increment)
  File 
/opt/acme/local/lib/python2.7/site-packages/s1audservice/statistics_service.py,
 line 397, in increment_counter
pycassa.ConsistencyLevel.ALL)
  File /opt/acme/local/lib/python2.7/site-packages/pycassa/columnfamily.py, 
line 1066, in add
allow_retries=self._allow_retries)
  File /opt/acme/local/lib/python2.7/site-packages/pycassa/pool.py, line 577, 
in execute
return getattr(conn, f)(*args, **kwargs)
  File /opt/acme/local/lib/python2.7/site-packages/pycassa/pool.py, line 148, 
in new_f
(self._retry_count, exc.__class__.__name__, exc))
MaximumRetryException: Retried 1 times. Last failure was EOFError: 
I'm using DataStax Enterprise version 3.2.2-1, with cassandra 1.2.12.2.
my application is written in python 2.7 and connects cassandra through pycassa 
1.10.0, my thrift version is 0.9.1

here is the code that connects cassandra:
 __instances = {}
__CONNECTION_POOL_SIZE_PER_HOST = 20

def __init__(self, prefix):
if not prefix:
raise ValueError (Invalid prefix for connection)

self.ip_seeds = ConfigService.get().get_property(prefix + .host) + 
:\
   + ConfigService.get().get_property(prefix + .port)
self.cluster_name = ConfigService.get().get_property(prefix + 
.cluster_name)
self.keyspace_name = ConfigService.get().get_property(prefix + 
.keyspace)
self.conec_pool_name = ConfigService.get().get_property(prefix + 
.pool_name)
self.con_pool = pycassa.ConnectionPool(self.keyspace_name, 
server_list=[self.ip_seeds], timeout=50)


this is the code that triggers the exception:
cf_to_update.add(
 (counter_type_tag, long(str_date_formatted)),
 nosql_id,
 int(value),
 None,
 pycassa.ConsistencyLevel.ALL)


This same code is running successfully without any errors with a single node 
cassandra cluster and very close configurations:
python 2.7 apache cassandra 1.2.8, pycassa 1.10.0, thrift 0.9.1 

any help would be greatly appreciated.

help on backup muiltinode cluster

2013-12-06 Thread Amalrik Maia
hey guys, I'm trying to take backups of a multi-node cassandra and save them on 
S3. 
My idea is simply doing ssh to each server and use nodetool to create the 
snapshots then push then to S3. 

So is this approach recommended? my concerns are about inconsistencies that 
this approach can lead, since the snapshots are taken one by one and not in 
parallel.  
Should i worry about it or cassandra finds a way to deal with inconsistencies 
when doing a restore?

PS: I'm aware that datastax recommends to use pssh to take snapshots in 
parallel, but i couldn't use pssh because node tool requires you to specify the 
hostname.
nodetool -h 10.10.10.1 snapshot thissnapshotname

Any help would be appreciated.
[]'s