Re: Working with legacy data via CQL

2014-11-12 Thread Erik Forsberg
On 2014-11-11 19:40, Alex Popescu wrote:
 On Tuesday, November 11, 2014, Erik Forsberg forsb...@opera.com
 mailto:forsb...@opera.com wrote:
 
 
 You'll have better chances to get an answer about the Python driver on
 its own mailing
 list  
 https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user

As I said, this also happens when using cqlsh:

cqlsh:test SELECT column1,value from Users where key =
a6b07340-047c-4d4c-9a02-1b59eabf611c and column1 = 'date_created';

 column1  | value
--+--
 date_created | '\x00\x00\x00\x00Ta\xf3\xe0'

(1 rows)

Failed to decode value '\x00\x00\x00\x00Ta\xf3\xe0' (for column 'value')
as text: 'utf8' codec can't decode byte 0xf3 in position 6: unexpected
end of data

So let me rephrase: How do I work with data where the table has metadata
that makes some columns differ from the main validation class? From
cqlsh, or the python driver, or any driver?

Thanks,
\EF


Re: Better option to load data to cassandra

2014-11-12 Thread Brice Dutheil
On our project we wrote ourself a custom batch to load the data to
cassandra the way we wanted.

-- Brice

On Tue, Nov 11, 2014 at 2:33 PM, srinivas rao pinnakasrin...@gmail.com
wrote:

 hi Alexey,

 i tried with sqoop, and data stax copy command. any other options we can
 use.

 i have one more question that, have a table with compisite key as row key,
 so how can i do using sqoop or copy command while exporing.

 Thanks
 Srinivas


 On Tue, Nov 11, 2014 at 6:28 PM, Plotnik, Alexey aplot...@rhonda.ru
 wrote:

  What have you tried?

 -- Original Message --
 From: srinivas rao pinnakasrin...@gmail.com
 To: Cassandra Users user@cassandra.apache.org
 Sent: 11.11.2014 22:51:54
 Subject: Better option to load data to cassandra


  Hi Team,

 Please suggest me the better options to load data from NoSql to Cassndra.



 Thanks
 Srini





Re: nodetool repair stalled

2014-11-12 Thread Eric Stevens
Wouldn't it be a better idea to issue removenode on the crashed node, wipe
the whole data directory (including system) and let it bootstrap cleanly so
that it's not part of the cluster while it gets back up to speed?

On Tue, Nov 11, 2014, 12:32 PM Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 11, 2014 at 10:48 AM, venkat sam samvenkat...@outlook.com
 wrote:


 I have a 5 node cluster. In one node one of the data directory partition
 got crashed. After disk replacement I restarted the Cassandra daemon and
 gave nodetool repair to restore the missing replica’s. But nodetool repair
 is getting stuck after syncing one of the columnfamily


 Yes, nodetool repair often hangs. Search through the archives, but the
 summary is.

 1) try to repair CFs one at a time
 2) it's worse with vnodes
 3) try tuning the phi detector or network stream timeouts

 =Rob




Cassandra sort using updatable query

2014-11-12 Thread Chamila Wijayarathna
Hello all,

I have a data set with attributes content and year. I want to put them in
to CF 'words' with attributes ('content','year','frequency'). The CF should
support following operations.

   - Frequency attribute of a column can be updated (i.e. - : can run query
   like UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;),
   where clause should contain content and year
   - Should support select query like Select content from words where year
   = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year)
   where results can be ordered using frequency

Is this kind of requirement can be fulfilled using Cassandra? What is the
CF structure and indexing I need to use here? What queries should I use to
create CF and in indexing?


Thank You!



-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Cassandra sort using updatable query

2014-11-12 Thread Jonathan Haddad
With Cassandra you're going to want to model tables to meet the
requirements of your queries instead of like a relational database where
you build tables in 3NF then optimize after.

For your optimized select query, your table (with caveat, see below) could
start out as:

create table words (
  year int,
  frequency int,
  content text,
  primary key (year, frequency, content) );

You may want to maintain other tables as well for different types of select
statements.

Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
since you can't change the value of a clustering column.  If you don't know
what your old frequency is ahead of time (to do the delete), you'll need to
keep another table mapping content,year - frequency.

Now, the tricky part here is that the above model will limit the total
number of partitions you've got to the number of years you're working with,
and will not scale as the cluster increases in size.  Ideally you could
bucket frequencies.  If that feels like too much work (it's starting to for
me), this may be better suited to something like solr, elastic search, or
DSE (cassandra + solr).

Does that help?

Jon






On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hello all,

 I have a data set with attributes content and year. I want to put them in
 to CF 'words' with attributes ('content','year','frequency'). The CF should
 support following operations.

- Frequency attribute of a column can be updated (i.e. - : can run
query like UPDATE words SET frequency = 2 WHERE content='abc' AND
year=1990;), where clause should contain content and year
- Should support select query like Select content from words where
year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year)
where results can be ordered using frequency

 Is this kind of requirement can be fulfilled using Cassandra? What is the
 CF structure and indexing I need to use here? What queries should I use to
 create CF and in indexing?


 Thank You!



 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.



Re: nodetool repair stalled

2014-11-12 Thread Robert Coli
On Wed, Nov 12, 2014 at 6:50 AM, Eric Stevens migh...@gmail.com wrote:

 Wouldn't it be a better idea to issue removenode on the crashed node, wipe
 the whole data directory (including system) and let it bootstrap cleanly so
 that it's not part of the cluster while it gets back up to

Yes, with replace_node. I missed that the entire data dir was lost in my
first response.

=Rob


Cassandra patterns/design for setting up a history/version/change log table?

2014-11-12 Thread Jacob Rhoden
Hi Guys,

Assuming you have, for example, an “account” table, and an “account_history” 
table which simply tracks older versions of what a persons account looks like 
when an administrator edits a customer account.

Given that we don’t have the luxury of a safe transaction to update the account 
record, i.e. to do:

 - select account details
 - compare old account details with new account details
 - if there are changes to the account
- copy old account details to account_history table
- update account

How do people deal with this in a multi data centre environment? The closest 
thing I can think of is something like this on “save:

 - insert new record into account_history table
 - update record into account table
 - every hour look for duplicate rows in account_history table and duplicate 
where someone did a save that did not change any fields on the account table.

My biggest problem with the above, is, what happens if you want to bulk load a 
data file into your account table, and it — for example — contains 1 million 
records, and only actually changes 100 account entries. For bulk loading you 
could probably resort to doing a select before update” just to prevent 1 
million pointless updates into the account_history table, but that feels a bit 
yucky. Some sort of java stored procedure might help here, but surely this is a 
common enough use case that we shouldn’t have to write custom java code for the 
Cassandra right?

Thanks!
Jacob



Programmatic Cassandra version detection/extraction

2014-11-12 Thread Otis Gospodnetic
Hi,

Is there a way to detect which version of Cassandra one is running?
Is there an API for that, or a constant with this value, or maybe an MBean
or some other way to get to this info?

Here's the use case:
SPM monitors Cassandra http://sematext.com/spm/, but Cassandra MBeans and
metrics have or may change over time.
How will SPM agent know which MBeans to look for, which metrics to extract,
and how to interpret values it extracts without knowing which version
of Cassandra it's monitoring?
It could try probing for some known MBeans and deduce Cassandra version
from that, but that feels a little sloppy.
Ideally, we'd be able to grab the version from some MBean and based on that
extract metrics we know are exposed in that version of Cassandra.

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


Re: Programmatic Cassandra version detection/extraction

2014-11-12 Thread Michael Shuler

On 11/12/2014 04:44 PM, Otis Gospodnetic wrote:

Is there a way to detect which version of Cassandra one is running?
Is there an API for that, or a constant with this value, or maybe an
MBean or some other way to get to this info?


I'm not sure if there are other methods, but this should always work:

  SELECT release_version from system.local;

--
Michael


Re: Programmatic Cassandra version detection/extraction

2014-11-12 Thread Michael Shuler

On 11/12/2014 04:58 PM, Michael Shuler wrote:

On 11/12/2014 04:44 PM, Otis Gospodnetic wrote:

Is there a way to detect which version of Cassandra one is running?
Is there an API for that, or a constant with this value, or maybe an
MBean or some other way to get to this info?


I'm not sure if there are other methods, but this should always work:

   SELECT release_version from system.local;


I asked the devs about where I might find the version in jmx and got the 
hint that I could cheat and look at `nodetool gossipinfo`.


It looks like RELEASE_VERSION is reported as a field in 
org.apache.cassandra.net FailureDetector AllEndpointStates.


--
Michael


Re: Better option to load data to cassandra

2014-11-12 Thread cass savy
Sstableloader works well for large tables if you want to move data from
Cassandra to Cassandra. This works if both C* are on the same version.

Sstable2json and json2sstable is another alternative.
On Nov 11, 2014 4:53 AM, srinivas rao pinnakasrin...@gmail.com wrote:

 Hi Team,

 Please suggest me the better options to load data from NoSql to Cassndra.



 Thanks
 Srini



Re: nodetool repair stalled

2014-11-12 Thread venkat sam
Hi Eric,

The data are stored in JBOD. Only one of the disk got crashed other 3 disk 
still holds the old data . That's why I didn't clean the whole node and issue a 
fresh restart


Thanks Rob. Will do try that way.






From: Eric Stevens
Sent: ‎Wednesday‎, ‎November‎ ‎12‎, ‎2014 ‎8‎:‎21‎ ‎PM
To: user@cassandra.apache.org





Wouldn't it be a better idea to issue removenode on the crashed node, wipe the 
whole data directory (including system) and let it bootstrap cleanly so that 
it's not part of the cluster while it gets back up to speed?



On Tue, Nov 11, 2014, 12:32 PM Robert Coli rc...@eventbrite.com wrote:




On Tue, Nov 11, 2014 at 10:48 AM, venkat sam samvenkat...@outlook.com wrote:







I have a 5 node cluster. In one node one of the data directory partition got 
crashed. After disk replacement I restarted the Cassandra daemon and gave 
nodetool repair to restore the missing replica’s. But nodetool repair is 
getting stuck after syncing one of the columnfamily




Yes, nodetool repair often hangs. Search through the archives, but the summary 
is.




1) try to repair CFs one at a time

2) it's worse with vnodes

3) try tuning the phi detector or network stream timeouts




=Rob