I have an application that need to find out the n most recent modified
files for a given user id. I started out few tables but still couldn't get
what i want, I hope someone get point to some right direction...
See my tables below.
#1 won't work, because file_id's timeuuid contains creation
series of
modification timestamp for the same directory.
Not sure I understand the problem.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
I
-
From: y2k...@gmail.com on behalf of Jimmy Lin
Sent: Thu 11-Jul-13 13:09
To: user@cassandra.apache.org
Subject: Re: data model question : finding out the n most recent changes
items
what I mean is, I really just want the last modified date instead of series
of timestamp and still able
function:
http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results
You can use it to page through your rows.
Blake
On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:
hi,
I want to fetch all the row keys of a table using CQL3:
e.g
select id from mytable
hi,
i am using astyanax to access a multi nodes cassandra cluster.
In my connnection configuration setup, i already declared a global
consistency read/write level by setting:
AstanaxConfiguration.setDefaultWriteConsistencyLevel()
AstanaxConfiguration.setDefaultReadConsistencyLevel()
however,
hi,
we have a table that its primary key is uuid type. Now we decide that we
need to use text type as it is more flexible for our application.
#1
is there any downside using text as primary key? any performance impact on
the partition ?
#2
There is no way to alter a table's primary key with a
i have a table like the following:
CREATE TABLE log (
mykey timeuuid,
type text,
msg text,
primary key(mykey, type)
);
I want to page through all the results from the table using
select * from log where token(mykey) token(maxTimeuuid(x)) limit 100;
(where xxx is 0 for the first query, and
Algermissen
jan.algermis...@nordsc.com wrote:
Jimmy,
On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:
i have a table like the following:
CREATE TABLE log (
mykey timeuuid,
type text,
msg text,
primary key(mykey, type)
);
I want to page through all the results
that your 'pages' can get truncated in
the middle of a wide row.
See
https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ
Jan
On 01.10.2013, at 18:12, Jimmy Lin y2klyf+w...@gmail.com wrote:
unfortunately, i have to stick with 1.2 for now
last key, but doesn't do
anything good to the token function. The argument to the token should
really be the actual key value.
On Tue, Oct 1, 2013 at 9:32 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:
thanks, yea i am aware of that, and have already taken care.
I just also found out a similar
I have a simple column family like the following
create table people(
company_id text,
employee_id text,
gender text,
primary key(company_id, employee_id)
);
if I want to find out all the male employee given a company id, I can do
1/
select * from people where company_id='
and loop through
indexes on binary fields true/false male/female are not
terrible effective.
On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:
I have a simple column family like the following
create table people(
company_id text,
employee_id text,
gender text,
primary key(company_id
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333')
is there a limit on how many item you can specify inside IN clause?
CQL IN clause will help reduce the round trip traffic otherwise needed if
use multiple select statement, correct?
but how about the co-ordinate node that
hi,
look at the collection type support in cql3,
e.g
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
we can append or remove using + and - operator
UPDATE users
SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
UPDATE users
SET top_places =
I am wondering if there is any negative impact on Cassandra write
operation, if I turn on row caching for a table that has mostly 'static
columns' but few frequently write columns (like timestamp).
The application will frequently write to a few columns, and the application
will also frequently
and page cache, but I
don't believe this is possible for row cache.
Hope that helps.
Jonathan
Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
http://www.linkedin.com/in/jlacefield
http://www.datastax.com/cassandrasummit14
On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin y2klyf
thanks all for the pointers.
let' me see if I can put the sequences of event together
1.2
people mis-understand/mis-use row cache, that cassandra cached the entire
row of data even if you are only looking for small subset of the row data.
e.g
select single_column from a_wide_row_table
will
Hi,
I have a column family/ table that has frequent update on one of the
column, and one column that has infrequent update. Rest of the columns
never changed. Our application also read frequently on this table.
We have seen some read latency issue on this table and plan to switch to
use level
Hi,
looking at the docs, the default value for concurrent_reads is 32, which
seems bit small to me (comparing to say http server)? because if my node is
receiving slight traffic, any more than 32 concurrent read query will have
to wait.(?)
Recommend rule is, 16* number of drives. Would that be
or not. If its near 32 (or whatever you set it at) all the time it
may be a bottleneck.
---
Chris Lohfink
On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
Hi,
looking at the docs, the default value for concurrent_reads is 32, which
seems bit small to me (comparing
I see, thanks for explaining what that means.
If we are using SSD, then reordering/merging has less impact than
traditional mechanical hard disk, so using SSD drive probably can deal
with increased concurrent_read better. (?)
is there any significant performance penalty if one turn on Cassandra
query tracing, through DataStax java driver (say, per every query request
of some trouble query)?
More sampling seems better but then doing so may also slow down the system
in some other ways?
thanks
on the load impact it will provide a lot of insight and you can
control the cost.
---
Chris Lohfink
On Fri, Nov 7, 2014 at 11:35 AM, Jimmy Lin y2klyf+w...@gmail.com
wrote:
is there any significant performance penalty if one turn on Cassandra
query tracing, through DataStax java driver (say
Mailbox https://www.dropbox.com/mailbox
On Sat, Nov 15, 2014 at 9:40 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:
Well we are able to do the tracing under normal load, but not yet able
to turn on tracing on demand during heavy load from client side(due to hard
to predict traffic pattern
I have a CF that use the default, read_repair_chance (0.1) and
dc_read_repair_chance(0).
Our read and write is all local_quorum, on one of the 2 DC, replication of
3.
so a read will have 10% chance trigger a read repair to other DC.
#
I have read that read repair suppose to be running as
, Nov 16, 2014 at 5:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
I have read that read repair suppose to be running as background, but
does the co-ordinator node need to wait for the response(along with other
normal read tasks) before return the entire result back to the caller?
For the 10
Hi,
Ran into RPC timeout exception when execution a query that involve
secondary index of a Boolean column when for example the company has more
than 1k person.
select * from company where company_id= and isMale = true;
such extreme low cardinality of secondary index like the other docs
countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
Yes, sometimes it is create table and sometime it is create
wrote:
That is a problem, you should not have RF N.
Do an alter table to fix it.
This will affect your reads and writes if you're doing anything CL 1 --
timeouts.
On Apr 23, 2015 4:35 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:
Also I am not sure it matters, but I just realized
Also I am not sure it matters, but I just realized the keyspace created has
replication factor of 2 when my Cassandra is really just a single node.
Is Cassandra smart enough to ignore the RF of 2 and work with only 1 single
node?
On Mon, Apr 20, 2015 at 8:23 PM, Jimmy Lin y2klyf+w...@gmail.com
hi,
we have some unit tests that run parallel that will create tmp keyspace,
and tables and then drop them after tests are done.
From time to time, our create table statement run into All hosts(s) for
query failed... Timeout during read (from datastax driver) error.
We later turn on tracing, and
in Test | jim.witsc...@datastax.com
On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
hi,
we have some unit tests that run parallel that will create tmp keyspace,
and
tables and then drop them after tests are done.
From time to time, our create table statement run
a good repair run in recent days?
>
> Sounds good.
>
> You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more
> information.
>
>
> 2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+w...@gmail.com>:
>>
>> hi all,
>> few questions
> > cluster?
>
> Check if repair is being executed on all nodes within gc_grace_seconds, and
> tune that value or troubleshoot problems otherwise.
>
> > Scanning through parent_repair_history and making sure all the known
> > keyspaces has a
right, because repair sessions in different keyspaces will have different
> repair session ids.
>
> 2016-02-25 15:04 GMT-03:00 Jimmy Lin <y2k...@gmail.com>:
>> hi Paulo,
>> follow up on the # of entries question...
>> why each job repair execution will have
hi all,
what are the better ways to check replication overall status of cassandra
cluster?
within a single DC, unless a node is down for long time, most of the time i
feel it is pretty much non-issue and things are replicated pretty fast. But
when a node come back from a long offline, is
hi all,
few questions regarding how to read or digest the
system_distributed.parent_repair_history CF, that I am very intereted to
use to find out our repair status...
-
Is every invocation of nodetool repair execution will be recorded as one
entry in parent_repair_history CF regardless if it is
select * from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants
> CONTAINS 'node_IP';
>
>
>
> 2016-02-25 16:22 GMT-03:00 Jimmy Lin <y2k...@gmail.com>:
>
>> hi Paulo,
>>
>>
t;
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2k...@gmail.com> wrote:
>
>> hi all,
>&
data consistency checks and updates _on
> the query being performed_.
> 3) Repair.
>
> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
> _will_ have missing data. If you cannot tolerate this situation, you need
> to take a look at your tunable consistency and/or
ect * from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants
> CONTAINS 'node_IP';
>
>
>
> 2016-02-25 16:22 GMT-03:00 Jimmy Lin <y2k...@gmail.com>:
>
>> hi Paulo,
>>
>> one more follo
Hi all,
What is the difference between datastax driver Batch and BatchStatement?
In particular, BatchStatment call out that it needs native protocol of
version 2 or above.
What is the advantage using native protocol 2.0 for batch execution?
Will any of these two api smart enough to split a big
hi all,
we like to consider using light weight transaction like the following:
begin batch:
update table set x=y where id=A if not exists;
update table set x=y where id=B if not exists;
update table set x=y where id=C if not exists;
update table set x=y where id=D if not exists;
apply batch
(using
I have a following table(using default sized tier compaction) that its column
get TTLed every hour(as we want to keep only the last 1 hour events)
And I do
Select * from mytable where object_id = ‘’ LIMIT 1;
And since query only interested in last/latest value, will cassandra need to
scan
hi,
if i have a row in a table that contain large data (not necessary super
wide row), say 10 G and a replication factor of 3.
During a repair, if the data of the row in each of the node is simply off
by 1 byte, is cassandra smart enough to stream only partial of the data
(maybe based on a range
hi all,
I have some customized retry policies that want to test.
In my single node local cluster, is there anyway to simulate the read/write
timeout and or unavailable exception?
I tried to kill the Cassandra process but it won't result in unavailable
exception but no host available exception and
46 matches
Mail list logo