Re: Cassandra sort using updatable query

2014-11-13 Thread Chamila Wijayarathna
Hi Jonathan,

Thank you very much, it worked this way.

On Thu, Nov 13, 2014 at 12:07 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 With Cassandra you're going to want to model tables to meet the
 requirements of your queries instead of like a relational database where
 you build tables in 3NF then optimize after.

 For your optimized select query, your table (with caveat, see below) could
 start out as:

 create table words (
   year int,
   frequency int,
   content text,
   primary key (year, frequency, content) );

 You may want to maintain other tables as well for different types of
 select statements.

 Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
 since you can't change the value of a clustering column.  If you don't know
 what your old frequency is ahead of time (to do the delete), you'll need to
 keep another table mapping content,year - frequency.

 Now, the tricky part here is that the above model will limit the total
 number of partitions you've got to the number of years you're working with,
 and will not scale as the cluster increases in size.  Ideally you could
 bucket frequencies.  If that feels like too much work (it's starting to for
 me), this may be better suited to something like solr, elastic search, or
 DSE (cassandra + solr).

 Does that help?

 Jon






 On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna 
 cdwijayarat...@gmail.com wrote:

 Hello all,

 I have a data set with attributes content and year. I want to put them in
 to CF 'words' with attributes ('content','year','frequency'). The CF should
 support following operations.

- Frequency attribute of a column can be updated (i.e. - : can run
query like UPDATE words SET frequency = 2 WHERE content='abc' AND
year=1990;), where clause should contain content and year
- Should support select query like Select content from words where
year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has 
 year)
where results can be ordered using frequency

 Is this kind of requirement can be fulfilled using Cassandra? What is the
 CF structure and indexing I need to use here? What queries should I use to
 create CF and in indexing?


 Thank You!



 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.




-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Cassandra sort using updatable query

2014-11-12 Thread Jonathan Haddad
With Cassandra you're going to want to model tables to meet the
requirements of your queries instead of like a relational database where
you build tables in 3NF then optimize after.

For your optimized select query, your table (with caveat, see below) could
start out as:

create table words (
  year int,
  frequency int,
  content text,
  primary key (year, frequency, content) );

You may want to maintain other tables as well for different types of select
statements.

Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
since you can't change the value of a clustering column.  If you don't know
what your old frequency is ahead of time (to do the delete), you'll need to
keep another table mapping content,year - frequency.

Now, the tricky part here is that the above model will limit the total
number of partitions you've got to the number of years you're working with,
and will not scale as the cluster increases in size.  Ideally you could
bucket frequencies.  If that feels like too much work (it's starting to for
me), this may be better suited to something like solr, elastic search, or
DSE (cassandra + solr).

Does that help?

Jon






On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hello all,

 I have a data set with attributes content and year. I want to put them in
 to CF 'words' with attributes ('content','year','frequency'). The CF should
 support following operations.

- Frequency attribute of a column can be updated (i.e. - : can run
query like UPDATE words SET frequency = 2 WHERE content='abc' AND
year=1990;), where clause should contain content and year
- Should support select query like Select content from words where
year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year)
where results can be ordered using frequency

 Is this kind of requirement can be fulfilled using Cassandra? What is the
 CF structure and indexing I need to use here? What queries should I use to
 create CF and in indexing?


 Thank You!



 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.