ou decrease chunk_length_in_kb to 16 or 8 and repeat the test.
>>>
>>> On Wed, Sep 5, 2018, 5:51 AM wxn...@zjqunshuo.com
>>> wrote:
>>>
>>>> How large is your row? You may meet reading wide row problem.
>>>>
>>>> -Simon
>>&
.@zjqunshuo.com
>> wrote:
>>
>>> How large is your row? You may meet reading wide row problem.
>>>
>>> -Simon
>>>
>>> *From:* Laxmikant Upadhyay
>>> *Date:* 2018-09-05 01:01
>>> *To:* user
>>> *Subject:* High IO and p
gt; -Simon
>>
>> *From:* Laxmikant Upadhyay
>> *Date:* 2018-09-05 01:01
>> *To:* user
>> *Subject:* High IO and poor read performance on 3.11.2 cassandra cluster
>>
>> We have 3 node cassandra cluster (3.11.2) in single dc.
>>
>> We have
> *Subject:* High IO and poor read performance on 3.11.2 cassandra cluster
> We have 3 node cassandra cluster (3.11.2) in single dc.
>
> We have written 450 million records on the table with LCS. The write
> latency is fine. After write we perform read and update operations.
&g
How large is your row? You may meet reading wide row problem.
-Simon
From: Laxmikant Upadhyay
Date: 2018-09-05 01:01
To: user
Subject: High IO and poor read performance on 3.11.2 cassandra cluster
We have 3 node cassandra cluster (3.11.2) in single dc.
We have written 450 million records
We have 3 node cassandra cluster (3.11.2) in single dc.
We have written 450 million records on the table with LCS. The write
latency is fine. After write we perform read and update operations.
When we run read+update operations on newly inserted 1 million records (on
top of 450 m records) then
Hi,
Would you share some more context with us?
- What Cassandra version do you use?
- What is the data size per node?
- How much RAM does the hardware have?
- Does your client use paging?
A few ideas to explore:
- Try tracing the query, see what's taking time (and resources)
- From the
By reading 90 partitions concurrently(each having size 200 MB), My single
node Apache Cassandra became unresponsive,
no read and write works for almost 10 minutes.
I'm using this configs:
memtable_allocation_type: offheap_buffers
gc: G1GC
heap: 128GB
concurrent_reads: 128 (having more
>
> First if you don't care about insertion order it's better to use Set
> rather than list. List implementation requires read before write for some
> operations.
>
> Second, the read performance of the collection itself depends on 2 factors
> :
>
> 1) collection car
ores, the storage is not local and these VMs are being managed
> through openstack. There are roughly 200 million records being written per
> day (1 time_bucket) and maybe a few thousand records per partition
> (time_bucket, ad_id) at most. The amount of writes is not having a
> significant effec
AND time_bucket = ?
>
> the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3
> 100GB datastores, the storage is not local and these VMs are being managed
> through openstack. There are roughly 200 million records being written per
> day (1 time_bucket) and maybe a few tho
per partition (time_bucket,
ad_id) at most. The amount of writes is not having a significant effect on our
read performance as when writes are stopped, the read response time does not
improve noticeably. I have attached a trace of one query i ran which took
around 3 seconds which i would expect
how many sst tables were there? what compaction are you using ? These
properties define how many possible disk reads cassandra has to do to get
all the data you need depending on which SST Tables have data for your
partition key.
On Fri, May 8, 2015 at 6:25 PM, Alprema alpr...@alprema.com
According to the trace log, only one was read, the compaction strategy is
size tiered.
I attached a more readable version of my trace for details.
On Mon, May 11, 2015 at 11:35 AM, Anishek Agarwal anis...@gmail.com wrote:
how many sst tables were there? what compaction are you using ? These
I was planning on using a more server-friendly strategy anyway (by
parallelizing my workload on multiple metrics) but my concern here is more
about the raw numbers.
According to the trace and my estimation of the data size, the read from
disk was done at about 30MByte/s and the transfer between
Try breaking it up into smaller chunks using multiple threads and token
ranges. 86400 is pretty large. I found ~1000 results per query is good.
This will spread the burden across all servers a little more evenly.
On Thu, May 7, 2015 at 4:27 AM, Alprema alpr...@alprema.com wrote:
Hi,
I am
Hi,
I am writing an application that will periodically read big amounts of data
from Cassandra and I am experiencing odd performances.
My column family is a classic time series one, with series ID and Day as
partition key and a timestamp as clustering key, the value being a double.
The query I
Hi Diane,
On 17/07/14 06:19, Diane Griffith wrote:
We have been struggling proving out linear read performance with our cassandra
configuration, that it is horizontally scaling. Wondering if anyone has any
suggestions for what minimal configuration and approach to use to demonstrate
this.
We
is not
long enough given the fact we are not doing different offsets for each
client thread.
Thanks,
Diane
On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands duncan.sa...@gmail.com
wrote:
Hi Diane,
On 17/07/14 06:19, Diane Griffith wrote:
We have been struggling proving out linear read performance
tune Cassandra for single-node performance, but that seems lot a
lot of extra work, to me, compared to adding more cheap nodes.
-- Jack Krupansky
From: Diane Griffith
Sent: Thursday, July 17, 2014 9:31 AM
To: user
Subject: Re: trouble showing cluster scalability for read performance
Duncan
dfgriff...@gmail.com
*Sent:* Thursday, July 17, 2014 9:31 AM
*To:* user user@cassandra.apache.org
*Subject:* Re: trouble showing cluster scalability for read performance
Duncan,
Thanks for that feedback. I'll give a bit more info and then ask some
more questions.
*Our Goal*: Not to produce
, to me, compared to adding more cheap
nodes.
-- Jack Krupansky
*From:* Diane Griffith dfgriff...@gmail.com
*Sent:* Thursday, July 17, 2014 9:31 AM
*To:* user user@cassandra.apache.org
*Subject:* Re: trouble showing cluster scalability for read performance
Duncan,
Thanks
We have been struggling proving out linear read performance with our
cassandra configuration, that it is horizontally scaling. Wondering if
anyone has any suggestions for what minimal configuration and approach to
use to demonstrate this.
We were trying to go for a simple set up, so
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html
On Fri, Apr 4, 2014 at 11:34 AM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:
On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs ty...@datastax.com wrote:
On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav
Hello Shrikar,
We are still facing read latency issue, here is the histogram
http://pastebin.com/yEvMuHYh
On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:
Hello Shrikar,
Yes primary key is (studentID, subjectID). I had dropped the test table,
recreating and
Hi Apoorva,
As per the cfhistogram there are some rows which have more than 75k columns
and around 150k reads hit 2 SStables.
Are you sure that you are seeing more than 500ms latency? The cfhistogram
should the worst read performance was around 51ms
which looks reasonable with many reads hitting
the cfhistogram there are some rows which have more than 75k
columns and around 150k reads hit 2 SStables.
Are you sure that you are seeing more than 500ms latency? The cfhistogram
should the worst read performance was around 51ms
which looks reasonable with many reads hitting 2 sstables.
Thanks
2 SStables.
Are you sure that you are seeing more than 500ms latency? The
cfhistogram should the worst read performance was around 51ms
which looks reasonable with many reads hitting 2 sstables.
Thanks,
Shrikar
On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav
apoorva.gau...@myntra.com
read performance was around 51ms
which looks reasonable with many reads hitting 2 sstables.
Thanks,
Shrikar
On Wed, Apr 2, 2014 at 11:30 PM, Apoorva Gaurav
apoorva.gau...@myntra.com wrote:
Hello Shrikar,
We are still facing read latency issue, here is the histogram
http://pastebin.com
On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:
At the client side we are getting a latency of ~350ms, we are using
datastax driver 2.0.0 and have kept the fetch size as 500. And these are
coming while reading rows having ~200 columns.
And you're sure that the
I've observed that reducing fetch size results in better latency (isn't
that obvious :-)), tried from fetch size varying from 100 to 1, seeing
a lot of errors for 1. Haven't tried modifying the number of columns.
Let me start a new thread focused on fetch size.
On Wed, Apr 2, 2014 at
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:
Thanks Robert, Is there a workaround, as in our test setups we keep
dropping and recreating tables.
Use unique keyspace (or table) names for each test? That's the approach
they're taking in 5202...
=Rob
Thanks Sourabh,
I've modelled my table as studentID int, subjectID int, marks int, PRIMARY
KEY(studentID, subjectID) as primarily I'll be querying using studentID
and sometime using studentID and subjectID.
I've tried driver 2.0.0 and its giving good results. Also using its auto
paging feature.
From the doc : The fetch size controls how much resulting rows will be
retrieved simultaneously.
So, I guess it does not depend on the number of columns as such. As all the
columns for a key reside on the same node, I think it wouldn't matter much
whatever be the number of columns as long as we
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:
Yes primary key is (studentID, subjectID). I had dropped the test table,
recreating and populating it post which will share the cfhistogram. In such
case is there any practical limit on the rows I should fetch, for
Thanks Robert, Is there a workaround, as in our test setups we keep
dropping and recreating tables.
On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli rc...@eventbrite.com wrote:
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav apoorva.gau...@myntra.com
wrote:
Yes primary key is (studentID,
Hi,
I don't think there is a problem with the driver.
Regarding the schema, you may want to choose between wide rows and skinny
rows.
http://stackoverflow.com/questions/19039123/cassandra-wide-vs-skinny-rows-for-large-columns
http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html
Hi Apoorva,
Do you always query on studentID only or do you need to query on both
studentID and subjectID?
Also, I think using the latest driver (2.x) can make querying large number
of rows efficient.
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
On Sat, Mar 29,
Hello Sourabh,
I'd prefer to do query like select * from marks_table where studentID = ?
and subjectID in (?, ?, ??) but if its costly then can happily delegate
the responsibility to the application layer.
Haven't tried 2.x java driver for this specific issue but tried it once
earlier and
Hello All,
We've a schema which can be modeled as (studentID, subjectID, marks) where
combination of studentID and subjectID is unique. Number of studentID can
go up to 100 million and for each studentID we can have up to 10k
subjectIDs.
We are using apahce cassandra 2.0.4 and datastax java
Hi Apoorva,
I assume this is the table with studentId and subjectId as primary keys
and not other like like marks in that.
create table marks_table(studentId int, subjectId int, marks int, PRIMARY
KEY(studentId,subjectId));
Also could you give the cfhistogram stats?
nodetool cfhistograms your
Hello Shrikar,
Yes primary key is (studentID, subjectID). I had dropped the test table,
recreating and populating it post which will share the cfhistogram. In such
case is there any practical limit on the rows I should fetch, for e.g.
should I do
select * form marks_table where studentID =
Trying to find out why a cassandra read is taking so long, I used tracing and
limited the number of rows. Strangely, when I query 600 rows, I get results in
~50 milliseconds. But 610 rows takes nearly 1 second!
cqlsh select containerdefinitionid from containerdefinition limit 600;
... lots of
I would say no. If you design around row cache and your data acceas
patterns change your assertions will be invalidates and your performance
may be worst over time.
I would use the kiss here. Keep it a smple usng one column family.
Experiement with size teired vs leveled compaction.
On Thursday,
Stupid cell phone.
I would say no. If you design around row cache and your data access
patterns change, the original assertions may be invalidated and the
performance might be worst then the simple design.
On Mon, Oct 21, 2013 at 12:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
I would
Hi,
my rows consist of ~70 columns each, some containing small values, some
containing larger amounts of content (think small documents).
My data is occasionally updated and read several times per day as complete
paging through all rows.
The updates usually affect only about 10% of the small
the OOTB chunk size and BF settings)? I just decreased the sstable size to 5
MB and am waiting for compactions to complete to see if that makes a difference.
Thanks!
Relevant table definition if helpful (note that I also changed to the LZ4
compressor expecting better read performance and I
that I also changed to the LZ4
compressor expecting better read performance and I decreased the crc change
again to minimize read latency):
CREATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values mapTIMESTAMP,FLOAT,
sku_time mapTEXT
: SSTable size versus read performance
I am not sure of the new default is to use compression, but I do not believe
compression is a good default. I find compression is better for larger column
families that are sparsely read. For high throughput CF's I feel that
decompressing larger blocks hurts
@cassandra.apache.org
Date: Thursday, May 16, 2013 10:23 AM
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: SSTable size versus read performance
I am not sure of the new default is to use compression, but I do not
believe compression is a good default. I find compression is better for
larger
@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: SSTable size versus read performance
With you use compression you should play with your block size. I believe the
default may be 32K but I had more success with 8K, nearly same compression
ratio, less young gen memory pressure.
On Thu, May 16
My 5 cents: I'd check blockdev --getra for data drives - too high values
for readahead (default to 256 for debian) can hurt read performance.
On 05/16/2013 05:14 PM, Keith Wright wrote:
Hi all,
I currently have 2 clusters, one running on 1.1.10 using CQL2 and
one running on 1.2.4 using
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: SSTable size versus read performance
My 5 cents: I'd check blockdev --getra for data drives - too high values for
readahead (default to 256 for debian) can hurt read
@cassandra.apache.org
Subject: Re: SSTable size versus read performance
My 5 cents: I'd check blockdev --getra for data drives - too high values
for readahead (default to 256 for debian) can hurt read performance.
SSTable size
to 5 MB and changing the chunk size to 8 kb
From: Igor i...@4friends.od.ua
Reply-To: user@cassandra.apache.org user@cassandra.apache.org
Date: Thursday, May 16, 2013 1:55 PM
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: SSTable size versus read performance
just in case it will be useful to somebody - here is my checklist for
better read performance from SSD
1. limit read-ahead to 16 or 32
2. enable 'trickle_fsync' (available starting from cassandra 1.1.x)
3. use 'deadline' io-scheduler (much more important for rotational
drives then for SSD)
4
: SSTable size versus read performance
just in case it will be useful to somebody - here is my checklist for better
read performance from SSD
1. limit read-ahead to 16 or 32
2. enable 'trickle_fsync' (available starting from cassandra 1.1.x)
3. use 'deadline' io-scheduler (much more important
',
'sstable_compression': 'LZ4Compressor'};
From: Igor i...@4friends.od.ua
Reply-To: user@cassandra.apache.org user@cassandra.apache.org
Date: Thursday, May 16, 2013 4:27 PM
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: SSTable size versus read performance
just in case it will be useful
of different query techniques at Cassandra
SFhttp://www.datastax.com/events/cassandrasummit2012/presentations
1. Consider Leveled compaction instead of Size Tiered. LCS improves
read performance at the cost of more writes.
I would look at other options first.
If you want to know how many SSTables
Hi,
I have a small 2 node cassandra cluster that seems to be constrained by
read throughput. There are about 100 writes/s and 60 reads/s mostly against
a skinny column family. Here's the cfstats for that family:
SSTable count: 13
Space used (live): 231920026568
Space used (total):
compaction instead of Size Tiered. LCS improves
read performance at the cost of more writes.
2. You said skinny column family which I took to mean not a lot of
columns/row. See if you can organize your data into wider rows which
allow reading fewer rows and thus fewer queries/disk seeks.
3
I have a two node cluster hosting a 45 gig dataset. I periodically have to
read a high fraction (20% or so) of my 'rows', grabbing a few thousand at a
time and then processing them.
This used to result in about 300-500 reads a second which seemed quite
good. Recently that number has plummeted
did the amount of data finally exceed your per machine RAM capacity?
is it the same 20% each time you read? or do your periodic reads
eventually work through the entire dataset?
if you are essentially table scanning your data set, and the size
exceeds available RAM, then a degradation like that
before/after stats to answer your remaining
questions.
Thanks Aaron
On Wed, Sep 26, 2012 at 2:51 PM, aaron morton aa...@thelastpickle.comwrote:
Sounds very odd.
Is read performance degrading _after_ repair and compactions that
normally result have completed ?
What Compaction Strategy ?
What
, aaron morton aa...@thelastpickle.comwrote:
Sounds very odd.
Is read performance degrading _after_ repair and compactions that normally
result have completed ?
What Compaction Strategy ?
What OS and JVM ?
What are are the bloom filter false positive stats from cf stats ?
Do you have some read
Sounds very odd.
Is read performance degrading _after_ repair and compactions that normally
result have completed ?
What Compaction Strategy ?
What OS and JVM ?
What are are the bloom filter false positive stats from cf stats ?
Do you have some read latency numbers from cfstats ?
Also
Hey guys,
I've begun to notice that read operations take a performance nose-dive
after a standard (full) repair of a fairly large column family: ~11 million
records. Interestingly, I've then noticed that read performance returns to
normal after a full scrub of the column family. Is it possible
I would bucket the time stats as well.
If you write all the attributes at the same time, and always want to read them
together, storing them in something like a JSON blob is legitimate approach.
Other Aaron, can you elaborate on
I'm not using composite row keys (it's just
AsciiType) as
On Thu, May 17, 2012 at 8:55 AM, jason kowalewski
jay.kowalew...@gmail.com wrote:
We have been attempting to change our data model to provide more
performance in our cluster.
Currently there are a couple ways to model the data and i was
wondering if some people out there could help us out.
If you run a manual compaction with nodetool each CF will be compacted to a
single SSTable.
Not that this is not normally recommended as it means that automatic compaction
will take a long time to get to the file.
Take a look at nodetool cfhistograms to get an idea of how spread out your
Dne 1.12.2011 23:30, Bill napsal(a):
Our largest dataset has 1200 billion rows.
Radim, out of curiosity, how many nodes is that running across?
32
average 4 iops per read in
cassandra on cold system.
After OS cache warms enough to cache indirect seek blocks it gets faster
to almost ideal:
Workload took 79.76 seconds, thruput 200.59 ops/sec
Ideal cassandra read performance is (without caches) is 2 IOPS per read
- one io to read index, second
took 79.76 seconds, thruput 200.59 ops/sec
Ideal cassandra read performance is (without caches) is 2 IOPS per read
- one io to read index, second to data.
pure write workload:
Running workload in 40 threads 10 ops each.
Workload took 302.51 seconds, thruput 13222.62 ops/sec
write is slow here
@cassandra.apache.org; Kent Tong freemant2...@yahoo.com
Sent: Monday, November 21, 2011 5:22 AM
Subject: Re: read performance problem
There is something wrong with the system. Your benchmarks are way off. How are
you benchmarking? Are you using the stress lib included?
On Nov 19, 2011 8:58 PM, Kent
There is something wrong with the system. Your benchmarks are way off. How
are you benchmarking? Are you using the stress lib included?
On Nov 19, 2011 8:58 PM, Kent Tong freemant2...@yahoo.com wrote:
Hi,
On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am
testing the
Hi,
On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing
the
performance of Cassandra. The write performance is good: It can write a million
records
in 10 minutes. However, the query performance is poor and it takes 10 minutes
to read
10K records with sequential
Try to see if there is a lot of paging going on,
and run some benchmarks on the disk itself.
Are you running Windows or Linux? Do you think
the disk may be fragmented?
Maxim
On 11/19/2011 8:58 PM, Kent Tong wrote:
Hi,
On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am
Hi Everyone
I have a question with regards read performance and schema design if someone
could help please.
Our requirement is to store per user, many unique results (which is basically
an attempt at some questions ..) so I had thought of having the userid as the
row key and the result id
On Wed, Oct 26, 2011 at 9:35 PM, Ben Gambley ben.gamb...@intoscience.comwrote:
Hi Everyone
I have a question with regards read performance and schema design if
someone could help please.
Our requirement is to store per user, many unique results (which is
basically an attempt at some
On Wed, Oct 26, 2011 at 7:35 PM, Ben Gambley ben.gamb...@intoscience.comwrote:
Our requirement is to store per user, many unique results (which is
basically an attempt at some questions ..) so I had thought of having the
userid as the row key and the result id as columns.
The keys for the
at most 20-30 rows per ms and sometimes I get
My questions:
1) Any idea where the discrepency can come from ?
I'd like to believe there is some magic setting that will x10 my read
performance...
2) How do you recommend allocating memory ? Should I give the OS cache as
much as possible or should I
Well, folks, I'm feeling a little stupid right now (adding to the injury
inflicted by one Mr. Stump :-P).
So, here's the story. The cache hit rate is up around 97% now. The ruby code
is down to around 20-25ms to multiget the 20 rows. I did some profiling,
though, and realized that a lot of time
On Thu, Apr 1, 2010 at 9:37 PM, James Golick jamesgol...@gmail.com wrote:
Well, folks, I'm feeling a little stupid right now (adding to the injury
inflicted by one Mr. Stump :-P).
So, here's the story. The cache hit rate is up around 97% now. The ruby
code is down to around 20-25ms to
Yes.
J.
Sent from my iPhone.
On 2010-04-01, at 9:21 PM, Brandon Williams dri...@gmail.com wrote:
On Thu, Apr 1, 2010 at 9:37 PM, James Golick jamesgol...@gmail.com
wrote:
Well, folks, I'm feeling a little stupid right now (adding to the
injury inflicted by one Mr. Stump :-P).
So, here's
We are starting to use cassandra to power our activity feed. The way we
organize our data is simple. Events live in a CF called Events and are
keyed by a UUID. The timelines themselves live in a CF called Timelines,
which is keyed by user id (i.e. 1229) and contains a event uuids as column
names
What is your row cache hit rate?
By still slow do you mean no change observed or faster but not
fast enough?
On Tue, Mar 30, 2010 at 10:47 PM, James Golick jamesgol...@gmail.com wrote:
We are starting to use cassandra to power our activity feed. The way we
organize our data is simple. Events
86 matches
Mail list logo