data model question : finding out the n most recent changes items

2013-07-10 Thread Jimmy Lin
I have an application that need to find out the n most recent modified files for a given user id. I started out few tables but still couldn't get what i want, I hope someone get point to some right direction... See my tables below. #1 won't work, because file_id's timeuuid contains creation

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
series of modification timestamp for the same directory. Not sure I understand the problem. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
- From: y2k...@gmail.com on behalf of Jimmy Lin Sent: Thu 11-Jul-13 13:09 To: user@cassandra.apache.org Subject: Re: data model question : finding out the n most recent changes items what I mean is, I really just want the last modified date instead of series of timestamp and still able

Re: get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
function: http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results You can use it to page through your rows. Blake On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote: hi, I want to fetch all the row keys of a table using CQL3: e.g select id from mytable

CQL consistency level using astyanax

2013-09-20 Thread Jimmy Lin
hi, i am using astyanax to access a multi nodes cassandra cluster. In my connnection configuration setup, i already declared a global consistency read/write level by setting: AstanaxConfiguration.setDefaultWriteConsistencyLevel() AstanaxConfiguration.setDefaultReadConsistencyLevel() however,

changing the primary key type of a table

2013-09-27 Thread Jimmy Lin
hi, we have a table that its primary key is uuid type. Now we decide that we need to use text type as it is more flexible for our application. #1 is there any downside using text as primary key? any performance impact on the partition ? #2 There is no way to alter a table's primary key with a

paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
i have a table like the following: CREATE TABLE log ( mykey timeuuid, type text, msg text, primary key(mykey, type) ); I want to page through all the results from the table using select * from log where token(mykey) token(maxTimeuuid(x)) limit 100; (where xxx is 0 for the first query, and

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
Algermissen jan.algermis...@nordsc.com wrote: Jimmy, On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote: i have a table like the following: CREATE TABLE log ( mykey timeuuid, type text, msg text, primary key(mykey, type) ); I want to page through all the results

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
that your 'pages' can get truncated in the middle of a wide row. See https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ Jan On 01.10.2013, at 18:12, Jimmy Lin y2klyf+w...@gmail.com wrote: unfortunately, i have to stick with 1.2 for now

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
last key, but doesn't do anything good to the token function. The argument to the token should really be the actual key value. On Tue, Oct 1, 2013 at 9:32 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: thanks, yea i am aware of that, and have already taken care. I just also found out a similar

question about secondary index or not

2014-01-28 Thread Jimmy Lin
I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through

Re: question about secondary index or not

2014-01-28 Thread Jimmy Lin
indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id

cql IN clause question

2014-01-29 Thread Jimmy Lin
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333') is there a limit on how many item you can specify inside IN clause? CQL IN clause will help reduce the round trip traffic otherwise needed if use multiple select statement, correct? but how about the co-ordinate node that

fixed size collection possible?

2014-04-22 Thread Jimmy Lin
hi, look at the collection type support in cql3, e.g http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html we can append or remove using + and - operator UPDATE users SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo'; UPDATE users SET top_places =

row caching for frequently updated column

2014-04-28 Thread Jimmy Lin
I am wondering if there is any negative impact on Cassandra write operation, if I turn on row caching for a table that has mostly 'static columns' but few frequently write columns (like timestamp). The application will frequently write to a few columns, and the application will also frequently

Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
and page cache, but I don't believe this is possible for row cache. Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf

Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
thanks all for the pointers. let' me see if I can put the sequences of event together 1.2 people mis-understand/mis-use row cache, that cassandra cached the entire row of data even if you are only looking for small subset of the row data. e.g select single_column from a_wide_row_table will

frequently update/read table and level compaction

2014-10-20 Thread Jimmy Lin
Hi, I have a column family/ table that has frequent update on one of the column, and one column that has infrequent update. Rest of the columns never changed. Our application also read frequently on this table. We have seen some read latency issue on this table and plan to switch to use level

tuning concurrent_reads param

2014-10-29 Thread Jimmy Lin
Hi, looking at the docs, the default value for concurrent_reads is 32, which seems bit small to me (comparing to say http server)? because if my node is receiving slight traffic, any more than 32 concurrent read query will have to wait.(?) Recommend rule is, 16* number of drives. Would that be

Re: tuning concurrent_reads param

2014-11-05 Thread Jimmy Lin
or not. If its near 32 (or whatever you set it at) all the time it may be a bottleneck. --- Chris Lohfink On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Hi, looking at the docs, the default value for concurrent_reads is 32, which seems bit small to me (comparing

Re: tuning concurrent_reads param

2014-11-06 Thread Jimmy Lin
I see, thanks for explaining what that means. If we are using SSD, then reordering/merging has less impact than traditional mechanical hard disk, so using SSD drive probably can deal with increased concurrent_read better. (?)

query tracing

2014-11-07 Thread Jimmy Lin
is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? More sampling seems better but then doing so may also slow down the system in some other ways? thanks

Re: query tracing

2014-11-15 Thread Jimmy Lin
on the load impact it will provide a lot of insight and you can control the cost. --- Chris Lohfink On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say

Re: query tracing

2014-11-15 Thread Jimmy Lin
Mailbox https://www.dropbox.com/mailbox On Sat, Nov 15, 2014 at 9:40 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: Well we are able to do the tracing under normal load, but not yet able to turn on tracing on demand during heavy load from client side(due to hard to predict traffic pattern

read repair across DC and latency

2014-11-16 Thread Jimmy Lin
I have a CF that use the default, read_repair_chance (0.1) and dc_read_repair_chance(0). Our read and write is all local_quorum, on one of the 2 DC, replication of 3. so a read will have 10% chance trigger a read repair to other DC. # I have read that read repair suppose to be running as

Re: read repair across DC and latency

2014-11-19 Thread Jimmy Lin
, Nov 16, 2014 at 5:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have read that read repair suppose to be running as background, but does the co-ordinator node need to wait for the response(along with other normal read tasks) before return the entire result back to the caller? For the 10

timeout when using secondary index

2015-03-06 Thread Jimmy Lin
Hi, Ran into RPC timeout exception when execution a query that involve secondary index of a Boolean column when for example the company has more than 1k person. select * from company where company_id= and isMale = true; such extreme low cardinality of secondary index like the other docs

Re: timeout creating table

2015-04-20 Thread Jimmy Lin
countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Yes, sometimes it is create table and sometime it is create

Re: timeout creating table

2015-04-23 Thread Jimmy Lin
wrote: That is a problem, you should not have RF N. Do an alter table to fix it. This will affect your reads and writes if you're doing anything CL 1 -- timeouts. On Apr 23, 2015 4:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: Also I am not sure it matters, but I just realized

Re: timeout creating table

2015-04-23 Thread Jimmy Lin
Also I am not sure it matters, but I just realized the keyspace created has replication factor of 2 when my Cassandra is really just a single node. Is Cassandra smart enough to ignore the RF of 2 and work with only 1 single node? On Mon, Apr 20, 2015 at 8:23 PM, Jimmy Lin y2klyf+w...@gmail.com

timeout creating table

2015-04-19 Thread Jimmy Lin
hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run into All hosts(s) for query failed... Timeout during read (from datastax driver) error. We later turn on tracing, and

Re: timeout creating table

2015-04-20 Thread Jimmy Lin
in Test | jim.witsc...@datastax.com On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
a good repair run in recent days? > > Sounds good. > > You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more > information. > > > 2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+w...@gmail.com>: >> >> hi all, >> few questions

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
> > cluster? > > Check if repair is being executed on all nodes within gc_grace_seconds, and > tune that value or troubleshoot problems otherwise. > > > Scanning through parent_repair_history and making sure all the known > > keyspaces has a

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
right, because repair sessions in different keyspaces will have different > repair session ids. > > 2016-02-25 15:04 GMT-03:00 Jimmy Lin <y2k...@gmail.com>: >> hi Paulo, >> follow up on the # of entries question... >> why each job repair execution will have

Checking replication status

2016-02-25 Thread Jimmy Lin
hi all, what are the better ways to check replication overall status of cassandra cluster? within a single DC, unless a node is down for long time, most of the time i feel it is pretty much non-issue and things are replicated pretty fast. But when a node come back from a long offline, is

how to read parent_repair_history table?

2016-02-24 Thread Jimmy Lin
hi all, few questions regarding how to read or digest the system_distributed.parent_repair_history CF, that I am very intereted to use to find out our repair status... - Is every invocation of nodetool repair execution will be recorded as one entry in parent_repair_history CF regardless if it is

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
select * from repair_history where keyspace = 'ks' columnfamily_name = > 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants > CONTAINS 'node_IP'; > > > > 2016-02-25 16:22 GMT-03:00 Jimmy Lin <y2k...@gmail.com>: > >> hi Paulo, >> >>

Re: Checking replication status

2016-02-25 Thread Jimmy Lin
t; > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 > <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2k...@gmail.com> wrote: > >> hi all, >&

Re: Checking replication status

2016-02-29 Thread Jimmy Lin
data consistency checks and updates _on > the query being performed_. > 3) Repair. > > If a machine goes down for longer than max_hint_window_in_ms, AFAIK you > _will_ have missing data. If you cannot tolerate this situation, you need > to take a look at your tunable consistency and/or

Re: how to read parent_repair_history table?

2016-02-29 Thread Jimmy Lin
ect * from repair_history where keyspace = 'ks' columnfamily_name = > 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants > CONTAINS 'node_IP'; > > > > 2016-02-25 16:22 GMT-03:00 Jimmy Lin <y2k...@gmail.com>: > >> hi Paulo, >> >> one more follo

datastax java driver Batch vs BatchStatement

2016-03-24 Thread Jimmy Lin
Hi all, What is the difference between datastax driver Batch and BatchStatement? In particular, BatchStatment call out that it needs native protocol of version 2 or above. What is the advantage using native protocol 2.0 for batch execution? Will any of these two api smart enough to split a big

how expensive is light weight transaction: if not exists

2016-04-27 Thread Jimmy Lin
hi all, we like to consider using light weight transaction like the following: begin batch: update table set x=y where id=A if not exists; update table set x=y where id=B if not exists; update table set x=y where id=C if not exists; update table set x=y where id=D if not exists; apply batch (using

Limit 1

2016-04-20 Thread Jimmy Lin
I have a following table(using default sized tier compaction) that its column get TTLed every hour(as we want to keep only the last 1 hour events) And I do Select * from mytable where object_id = ‘’ LIMIT 1; And since query only interested in last/latest value, will cassandra need to scan

nodetool repair of large partition

2017-01-30 Thread Jimmy Lin
hi, if i have a row in a table that contain large data (not necessary super wide row), say 10 G and a replication factor of 3. During a repair, if the data of the row in each of the node is simply off by 1 byte, is cassandra smart enough to stream only partial of the data (maybe based on a range

testing retry policy

2016-08-31 Thread Jimmy Lin
hi all, I have some customized retry policies that want to test. In my single node local cluster, is there anyway to simulate the read/write timeout and or unavailable exception? I tried to kill the Cassandra process but it won't result in unavailable exception but no host available exception and