Re: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Marco Gasparini
Hi Sean,

> I will start – knowing that others will have additional help/questions
I hope that, I really need help with this :)

> What heap size are you using? Sounds like you are using the CMS garbage
collector.

Yes, I'm using CMS garbage Collector. I have not used G1 because I read it
isn't recommended but if you are saying that is going to help me with my
use case I have no objection in using it. I will try.
I have 3 nodes: node1 has 32GB and node2 and node3 16 GB. I'm currently
using 50% RAM for each node.


> Spinning disks are a problem, too. Can you tell if the IO is getting
overwhelmed? SSDs are much preferred.

I'm not sure about it, 'dstat' and 'iostat' tell me that rMB/s is
constantly above 100MB/s and %util is closed to 100% and in these
conditions the node is frozen.
HDD specifics says that maximum transfer rate is 175MB/s for node1 and
155MB/s for node2 and node3.
Unfortunately switching to spinning disk to SSD is not an option.



> Read before write is usually an anti-pattern for Cassandra. From your
queries, it seems you have a partition key and clustering key.
Can you give us the table schema? I’m also concerned about the IF EXISTS in
your delete.
I think that invokes a light weight transaction – costly for performance.
Is it really required for your use case?

I don't need the 'IF EXISTS' parameter. Actually is pretty much a refuse
from an old query and I can try to remove this.

Here the schema:

CREATE KEYSPACE my_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
CREATE TABLE my_keyspace.my_table (
pkey text,
event_datetime timestamp,
f1 text,
f2 text,
f3 text,
f4 text,
f5 int,
f6 bigint,
f7 bigint,
f8 text,
f9 text,
PRIMARY KEY (pkey, event_datetime)
) WITH CLUSTERING ORDER BY (event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 9
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';


Thank you very much
Marco

Il giorno ven 11 gen 2019 alle ore 16:14 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> I will start – knowing that others will have additional help/questions.
>
>
>
> What heap size are you using? Sounds like you are using the CMS garbage
> collector. That takes some arcane knowledge and lots of testing to tune. I
> would start with G1 and using ½ the available RAM as the heap size. I would
> want 32 GB RAM as a minimum on the hosts.
>
>
>
> Spinning disks are a problem, too. Can you tell if the IO is getting
> overwhelmed? SSDs are much preferred.
>
>
>
> Read before write is usually an anti-pattern for Cassandra. From your
> queries, it seems you have a partition key and clustering key. Can you give
> us the table schema? I’m also concerned about the IF EXISTS in your delete.
> I think that invokes a light weight transaction – costly for performance.
> Is it really required for your use case?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini 
> *Sent:* Friday, January 11, 2019 8:20 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] fine tuning for wide rows and mixed worload system
>
>
>
> Hello everyone,
>
>
>
> I need some advise in order to solve my use case problem. I have already
> tried some solutions but it didn't work out.
>
> Can you help me with the following configuration please? any help is very
> appreciate
>
>
>
> I'm using:
>
> - Cassandra 3.11.3
>
> - java version "1.8.0_191"
>
>
>
> My use case is composed by the following constraints:
>
> - about 1M reads per day (it is going to rise up)
>
> - about 2M writes per day (it is going to rise up)
>
> - there is a high peek of requests in less than 2 hours in which the
> system receives half of all day traffic (500K reads, 1M writes)
>
> - each request is composed by 1 read and 2 writes (1 delete + 1 write)
>
>
>
> * the read query selects max 3 records based on the primary
> key (select * from my_keyspace.my_table where pkey = ? limit 3)
>
> * then is performed a deletion of one record (delete from
> my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
>
> * finally t

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Durity, Sean R
I will start – knowing that others will have additional help/questions.

What heap size are you using? Sounds like you are using the CMS garbage 
collector. That takes some arcane knowledge and lots of testing to tune. I 
would start with G1 and using ½ the available RAM as the heap size. I would 
want 32 GB RAM as a minimum on the hosts.

Spinning disks are a problem, too. Can you tell if the IO is getting 
overwhelmed? SSDs are much preferred.

Read before write is usually an anti-pattern for Cassandra. From your queries, 
it seems you have a partition key and clustering key. Can you give us the table 
schema? I’m also concerned about the IF EXISTS in your delete. I think that 
invokes a light weight transaction – costly for performance. Is it really 
required for your use case?


Sean Durity

From: Marco Gasparini 
Sent: Friday, January 11, 2019 8:20 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] fine tuning for wide rows and mixed worload system

Hello everyone,

I need some advise in order to solve my use case problem. I have already tried 
some solutions but it didn't work out.
Can you help me with the following configuration please? any help is very 
appreciate

I'm using:
- Cassandra 3.11.3
- java version "1.8.0_191"

My use case is composed by the following constraints:
- about 1M reads per day (it is going to rise up)
- about 2M writes per day (it is going to rise up)
- there is a high peek of requests in less than 2 hours in which the system 
receives half of all day traffic (500K reads, 1M writes)
- each request is composed by 1 read and 2 writes (1 delete + 1 write)

* the read query selects max 3 records based on the primary key 
(select * from my_keyspace.my_table where pkey = ? limit 3)
* then is performed a deletion of one record (delete from 
my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
* finally the new data is stored (insert into my_keyspace.my_table 
(event_datetime, pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...))

- each row is pretty wide. I don't really know the exact size because there are 
2 dynamic text columns that stores data between 1MB to 50MB length each.
  So, reads are going to be huge because I read 3 records of that dimension 
every time. Writes are complex as well because each row is that wide.

Currently, I own 3 nodes with the following properties:
- node1:
* Intel Core i7-3770
* 2x HDD SATA 3,0 TB
* 4x RAM 8192 MB DDR3
* nominative bit rate 175MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   
/dev/sda
rw   256   512  4096  0   3000592982016   
/dev/sdb

- node2,3:
* Intel Core i7-2600
* 2x HDD SATA 3,0 TB
* 4x RAM 4096 MB DDR3
* nominative bit rate 155MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   
/dev/sda
rw   256   512  4096  0   3000592982016   
/dev/sdb

Each node has 2 disks but I have disabled RAID option and I have created a 
virtual single disk in order to get much free space.
Can this configuration create issues?

I have already tried some configurations in order to make it work, like:
1) straigthforward attempt
- default Cassandra configuration (cassandra.yaml)
- RF=1
- SizeTieredCompactionStrategy  (write strategy)
- no row cache (because of wide rows dimension is better to have no 
row cache)
- gc_grace_seconds = 1 day (unfortunately, I did no repair schedule 
at all)
results:
too many timeouts, losing data

2)
- added repair schedules
- RF=3 (in order increase reads speed)
results:
- too many timeouts, losing data
- high I/O consumption on each nodes (iostat shows 100% 
in %util on each nodes, dstat shows hundred of M read for each iteration)
- node2 frozen until I stopped data writes.
- node3 almost frozen
- many panding MutationStage events in TPSTATS in node2
- many full GC
- many HintsDispatchExecutor events in system.log

actual)
- added repair schedules
- RF=3
- set durable_writes = false in order to speed up writes
- increased young heap
- decreased SurviviorRatio in order to get much young size 
available because of wide rows data
- increased from 1 to 3 MaxTenuringThreshold in order to decrease 
reads latency
-

fine tuning for wide rows and mixed worload system

2019-01-11 Thread Marco Gasparini
Hello everyone,

I need some advise in order to solve my use case problem. I have already
tried some solutions but it didn't work out.
Can you help me with the following configuration please? any help is very
appreciate

I'm using:
- Cassandra 3.11.3
- java version "1.8.0_191"

My use case is composed by the following constraints:
- about 1M reads per day (it is going to rise up)
- about 2M writes per day (it is going to rise up)
- there is a high peek of requests in less than 2 hours in which the system
receives half of all day traffic (500K reads, 1M writes)
- each request is composed by 1 read and 2 writes (1 delete + 1 write)
* the read query selects max 3 records based on the primary key (select *
from my_keyspace.my_table where pkey = ? limit 3)
* then is performed a deletion of one record (delete from
my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
* finally the new data is stored (insert into my_keyspace.my_table
(event_datetime, pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...))

- each row is pretty wide. I don't really know the exact size because there
are 2 dynamic text columns that stores data between 1MB to 50MB length
each.
  So, reads are going to be huge because I read 3 records of that dimension
every time. Writes are complex as well because each row is that wide.

Currently, I own 3 nodes with the following properties:
- node1:
* Intel Core i7-3770
* 2x HDD SATA 3,0 TB
* 4x RAM 8192 MB DDR3
* nominative bit rate 175MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   /dev/sda
rw   256   512  4096  0   3000592982016   /dev/sdb
- node2,3:
* Intel Core i7-2600
* 2x HDD SATA 3,0 TB
* 4x RAM 4096 MB DDR3
* nominative bit rate 155MB/s
# blockdev --report /dev/sd[ab]
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  0   3000592982016   /dev/sda
rw   256   512  4096  0   3000592982016   /dev/sdb
Each node has 2 disks but I have disabled RAID option and I have created a
virtual single disk in order to get much free space.
Can this configuration create issues?

I have already tried some configurations in order to make it work, like:
1) straigthforward attempt
- default Cassandra configuration (cassandra.yaml)
- RF=1
- SizeTieredCompactionStrategy  (write strategy)
- no row cache (because of wide rows dimension is better to have no row
cache)
- gc_grace_seconds = 1 day (unfortunately, I did no repair schedule at all)
results:
too many timeouts, losing data

2)
- added repair schedules
- RF=3 (in order increase reads speed)
results:
- too many timeouts, losing data
- high I/O consumption on each nodes (iostat shows 100% in %util on each
nodes, dstat shows hundred of M read for each iteration)
- node2 frozen until I stopped data writes.
- node3 almost frozen
- many panding MutationStage events in TPSTATS in node2
- many full GC
- many HintsDispatchExecutor events in system.log
actual)
- added repair schedules
- RF=3
- set durable_writes = false in order to speed up writes
- increased young heap
- decreased SurviviorRatio in order to get much young size available
because of wide rows data
- increased from 1 to 3 MaxTenuringThreshold in order to decrease reads
latency
- increased Cassandra's memtable onheap and offheap dimensions beacause of
wide rows data
- changed memtable_allocation_type to offheap_objects bacause of wide rows
data
results:
- better GC performance on nodes1 and node3
- still high I/O consumption on each nodes (iostat shows 100% in %util on
each nodes, dstat shows hundred of M read for each iteration)
- still node2 completely frozen
- many panding MutationStage events in TPSTATS in node2
- many HintsDispatchExecutor events in system.log in each nodes

I cannot go to AWS but I can only get dedicated server.
Do you have any suggestions to fine tune the system on this use case?

Thank you
Marco


Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Jeff Jirsa
Slight nuance: we don't load the whole row into memory, but the column
index (and the result set, and the tombstones in the partition), which can
still spike your GC/heap (and potentially overflow the row cache, if you
have it on, which is atypical).

On Wed, Feb 21, 2018 at 1:35 PM, Carl Mueller <carl.muel...@smartthings.com>
wrote:

> Cass 2.1.14 is missing some wide row optimizations done in later cass
> releases IIRC.
>
> Speculation: IN won't matter, it will load the entire wide row into memory
> regardless which might spike your GC/heap and overflow the rowcache
>
> On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins <
> gareth.o.coll...@gmail.com> wrote:
>
>> Thanks for the response!
>>
>> I could understand that being the case if the Cassandra cluster is not
>> loaded. Splitting the work across multiple nodes would obviously make
>> the query faster.
>>
>> But if this was just a single node, shouldn't one IN query be faster
>> than multiple due to the fact that, if I understand correctly,
>> Cassandra should need to do less work?
>>
>> thanks in advance,
>> Gareth
>>
>> On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh
>> <rahul.xavier.si...@gmail.com> wrote:
>> > That depends on the driver you use but separate queries asynchronously
>> > around the cluster would be faster.
>> >
>> >
>> > --
>> > Rahul Singh
>> > rahul.si...@anant.us
>> >
>> > Anant Corporation
>> >
>> > On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>,
>> wrote:
>> >
>> > Someone can correct me if I'm wrong, but I believe if you do a large
>> IN() on
>> > a single partition's cluster keys, all the reads are going to be served
>> from
>> > a single replica.  Compared to many concurrent individual equal
>> statements
>> > you can get the performance gain of leaning on several replicas for
>> > parallelism.
>> >
>> > On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <
>> gareth.o.coll...@gmail.com>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> When querying large wide rows for multiple specific values is it
>> >> better to do separate queries for each value...or do it with one query
>> >> and an "IN"? I am using Cassandra 2.1.14
>> >>
>> >> I am asking because I had changed my app to use 'IN' queries and it
>> >> **appears** to be slower rather than faster. I had assumed that the
>> >> "IN" query should be faster...as I assumed it only needs to go down
>> >> the read path once (i.e. row cache -> memtable -> key cache -> bloom
>> >> filter -> index summary -> index -> compaction -> sstable) rather than
>> >> once for each entry? Or are there some additional caveats that I
>> >> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
>> >> query entries, closeness of 'IN' query values in the SSTable etc.)?
>> >>
>> >> thanks in advance,
>> >> Gareth Collins
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: user-h...@cassandra.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>


Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Carl Mueller
Cass 2.1.14 is missing some wide row optimizations done in later cass
releases IIRC.

Speculation: IN won't matter, it will load the entire wide row into memory
regardless which might spike your GC/heap and overflow the rowcache

On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins <gareth.o.coll...@gmail.com>
wrote:

> Thanks for the response!
>
> I could understand that being the case if the Cassandra cluster is not
> loaded. Splitting the work across multiple nodes would obviously make
> the query faster.
>
> But if this was just a single node, shouldn't one IN query be faster
> than multiple due to the fact that, if I understand correctly,
> Cassandra should need to do less work?
>
> thanks in advance,
> Gareth
>
> On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh
> <rahul.xavier.si...@gmail.com> wrote:
> > That depends on the driver you use but separate queries asynchronously
> > around the cluster would be faster.
> >
> >
> > --
> > Rahul Singh
> > rahul.si...@anant.us
> >
> > Anant Corporation
> >
> > On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>, wrote:
> >
> > Someone can correct me if I'm wrong, but I believe if you do a large
> IN() on
> > a single partition's cluster keys, all the reads are going to be served
> from
> > a single replica.  Compared to many concurrent individual equal
> statements
> > you can get the performance gain of leaning on several replicas for
> > parallelism.
> >
> > On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <
> gareth.o.coll...@gmail.com>
> > wrote:
> >>
> >> Hello,
> >>
> >> When querying large wide rows for multiple specific values is it
> >> better to do separate queries for each value...or do it with one query
> >> and an "IN"? I am using Cassandra 2.1.14
> >>
> >> I am asking because I had changed my app to use 'IN' queries and it
> >> **appears** to be slower rather than faster. I had assumed that the
> >> "IN" query should be faster...as I assumed it only needs to go down
> >> the read path once (i.e. row cache -> memtable -> key cache -> bloom
> >> filter -> index summary -> index -> compaction -> sstable) rather than
> >> once for each entry? Or are there some additional caveats that I
> >> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
> >> query entries, closeness of 'IN' query values in the SSTable etc.)?
> >>
> >> thanks in advance,
> >> Gareth Collins
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Gareth Collins
Thanks for the response!

I could understand that being the case if the Cassandra cluster is not
loaded. Splitting the work across multiple nodes would obviously make
the query faster.

But if this was just a single node, shouldn't one IN query be faster
than multiple due to the fact that, if I understand correctly,
Cassandra should need to do less work?

thanks in advance,
Gareth

On Wed, Feb 21, 2018 at 7:27 AM, Rahul Singh
<rahul.xavier.si...@gmail.com> wrote:
> That depends on the driver you use but separate queries asynchronously
> around the cluster would be faster.
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>, wrote:
>
> Someone can correct me if I'm wrong, but I believe if you do a large IN() on
> a single partition's cluster keys, all the reads are going to be served from
> a single replica.  Compared to many concurrent individual equal statements
> you can get the performance gain of leaning on several replicas for
> parallelism.
>
> On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <gareth.o.coll...@gmail.com>
> wrote:
>>
>> Hello,
>>
>> When querying large wide rows for multiple specific values is it
>> better to do separate queries for each value...or do it with one query
>> and an "IN"? I am using Cassandra 2.1.14
>>
>> I am asking because I had changed my app to use 'IN' queries and it
>> **appears** to be slower rather than faster. I had assumed that the
>> "IN" query should be faster...as I assumed it only needs to go down
>> the read path once (i.e. row cache -> memtable -> key cache -> bloom
>> filter -> index summary -> index -> compaction -> sstable) rather than
>> once for each entry? Or are there some additional caveats that I
>> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
>> query entries, closeness of 'IN' query values in the SSTable etc.)?
>>
>> thanks in advance,
>> Gareth Collins
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Rahul Singh
That depends on the driver you use but separate queries asynchronously around 
the cluster would be faster.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 20, 2018, 6:48 PM -0500, Eric Stevens <migh...@gmail.com>, wrote:
> Someone can correct me if I'm wrong, but I believe if you do a large IN() on 
> a single partition's cluster keys, all the reads are going to be served from 
> a single replica.  Compared to many concurrent individual equal statements 
> you can get the performance gain of leaning on several replicas for 
> parallelism.
>
> > On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins 
> > <gareth.o.coll...@gmail.com> wrote:
> > > Hello,
> > >
> > > When querying large wide rows for multiple specific values is it
> > > better to do separate queries for each value...or do it with one query
> > > and an "IN"? I am using Cassandra 2.1.14
> > >
> > > I am asking because I had changed my app to use 'IN' queries and it
> > > **appears** to be slower rather than faster. I had assumed that the
> > > "IN" query should be faster...as I assumed it only needs to go down
> > > the read path once (i.e. row cache -> memtable -> key cache -> bloom
> > > filter -> index summary -> index -> compaction -> sstable) rather than
> > > once for each entry? Or are there some additional caveats that I
> > > should be aware of for 'IN' query performance (e.g. ordering of 'IN'
> > > query entries, closeness of 'IN' query values in the SSTable etc.)?
> > >
> > > thanks in advance,
> > > Gareth Collins
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> > >


Re: Performance Of IN Queries On Wide Rows

2018-02-20 Thread Eric Stevens
Someone can correct me if I'm wrong, but I believe if you do a large IN()
on a single partition's cluster keys, all the reads are going to be served
from a single replica.  Compared to many concurrent individual equal
statements you can get the performance gain of leaning on several replicas
for parallelism.

On Tue, Feb 20, 2018 at 11:43 AM Gareth Collins <gareth.o.coll...@gmail.com>
wrote:

> Hello,
>
> When querying large wide rows for multiple specific values is it
> better to do separate queries for each value...or do it with one query
> and an "IN"? I am using Cassandra 2.1.14
>
> I am asking because I had changed my app to use 'IN' queries and it
> **appears** to be slower rather than faster. I had assumed that the
> "IN" query should be faster...as I assumed it only needs to go down
> the read path once (i.e. row cache -> memtable -> key cache -> bloom
> filter -> index summary -> index -> compaction -> sstable) rather than
> once for each entry? Or are there some additional caveats that I
> should be aware of for 'IN' query performance (e.g. ordering of 'IN'
> query entries, closeness of 'IN' query values in the SSTable etc.)?
>
> thanks in advance,
> Gareth Collins
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Performance Of IN Queries On Wide Rows

2018-02-20 Thread Gareth Collins
Hello,

When querying large wide rows for multiple specific values is it
better to do separate queries for each value...or do it with one query
and an "IN"? I am using Cassandra 2.1.14

I am asking because I had changed my app to use 'IN' queries and it
**appears** to be slower rather than faster. I had assumed that the
"IN" query should be faster...as I assumed it only needs to go down
the read path once (i.e. row cache -> memtable -> key cache -> bloom
filter -> index summary -> index -> compaction -> sstable) rather than
once for each entry? Or are there some additional caveats that I
should be aware of for 'IN' query performance (e.g. ordering of 'IN'
query entries, closeness of 'IN' query values in the SSTable etc.)?

thanks in advance,
Gareth Collins

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Wide rows splitting

2017-09-18 Thread Stefano Ortolani
You might find this interesting:
https://medium.com/@foundev/synthetic-sharding-in-cassandra-to-deal-with-large-partitions-2124b2fd788b

Cheers,
Stefano

On Mon, Sep 18, 2017 at 5:07 AM, Adam Smith  wrote:

> Dear community,
>
> I have a table with inlinks to URLs, i.e. many URLs point to
> http://google.com, less URLs point to http://somesmallweb.page.
>
> It has very wide and very skinny rows - the distribution is following a
> power law. I do not know a priori how many columns a row has. Also, I can't
> identify a schema to introduce a good partitioning.
>
> Currently, I am thinking about introducing splits by: pk is like (URL,
> splitnumber), where splitnumber is initially 1 and  hash URL mod
> splitnumber would determine the splitnumber on insert. I would need a
> separate table to maintain the splitnumber and a spark-cassandra-connector
> job counts the columns and and increases/doubles the number of splits on
> demand. This means then that I would have to move e.g. (URL1,0) -> (URL1,1)
> when splitnumber would be 2.
>
> Would you do the same? Is there a better way?
>
> Thanks!
> Adam
>


Wide rows splitting

2017-09-17 Thread Adam Smith
Dear community,

I have a table with inlinks to URLs, i.e. many URLs point to
http://google.com, less URLs point to http://somesmallweb.page.

It has very wide and very skinny rows - the distribution is following a
power law. I do not know a priori how many columns a row has. Also, I can't
identify a schema to introduce a good partitioning.

Currently, I am thinking about introducing splits by: pk is like (URL,
splitnumber), where splitnumber is initially 1 and  hash URL mod
splitnumber would determine the splitnumber on insert. I would need a
separate table to maintain the splitnumber and a spark-cassandra-connector
job counts the columns and and increases/doubles the number of splits on
demand. This means then that I would have to move e.g. (URL1,0) -> (URL1,1)
when splitnumber would be 2.

Would you do the same? Is there a better way?

Thanks!
Adam


Re: wide rows

2016-10-18 Thread Yabin Meng
With CQL data modeling, everything is called a "row". But really in CQL, a
row is just a logical concept. So if you think of "wide partition" instead
of "wide row" (partition is what is determined by the has index of the
partition key), it will help the understanding a bit: one wide-partition
may contain multiple logical CQL rows - each CQL row just represents one
actual storage column of the partition.

Time-series data is usually a good fit for "wide-partition" data modeling,
but please remember that don't go too crazy with it.

Cheers,

Yabin

On Tue, Oct 18, 2016 at 11:23 AM, DuyHai Doan  wrote:

> // user table: skinny partition
> CREATE TABLE user (
> user_id uuid,
> firstname text,
> lastname text,
> 
> PRIMARY KEY ((user_id))
> );
>
> // sensor_data table: wide partition
> CREATE TABLE sensor_data (
>  sensor_id uuid,
>  date timestamp,
>  value double,
>  PRIMARY KEY ((sensor_id),  date)
> );
>
> On Tue, Oct 18, 2016 at 5:07 PM, S Ahmed  wrote:
>
>> Hi,
>>
>> Can someone clarify how you would model a "wide" row cassandra table?
>> From what I understand, a wide row table is where you keep appending
>> columns to a given row.
>>
>> The other way to model a table would be the "regular" style where each
>> row contains data so you would during a SELECT you would want multiple rows
>> as oppose to a wide row where you would get a single row, but a subset of
>> columns.
>>
>> Can someone show a simple data model that compares both styles?
>>
>> Thanks.
>>
>
>


Re: wide rows

2016-10-18 Thread DuyHai Doan
// user table: skinny partition
CREATE TABLE user (
user_id uuid,
firstname text,
lastname text,

PRIMARY KEY ((user_id))
);

// sensor_data table: wide partition
CREATE TABLE sensor_data (
 sensor_id uuid,
 date timestamp,
 value double,
 PRIMARY KEY ((sensor_id),  date)
);

On Tue, Oct 18, 2016 at 5:07 PM, S Ahmed  wrote:

> Hi,
>
> Can someone clarify how you would model a "wide" row cassandra table?
> From what I understand, a wide row table is where you keep appending
> columns to a given row.
>
> The other way to model a table would be the "regular" style where each row
> contains data so you would during a SELECT you would want multiple rows as
> oppose to a wide row where you would get a single row, but a subset of
> columns.
>
> Can someone show a simple data model that compares both styles?
>
> Thanks.
>


RE: wide rows

2016-10-18 Thread S Ahmed
Hi,

Can someone clarify how you would model a "wide" row cassandra table?  From
what I understand, a wide row table is where you keep appending columns to
a given row.

The other way to model a table would be the "regular" style where each row
contains data so you would during a SELECT you would want multiple rows as
oppose to a wide row where you would get a single row, but a subset of
columns.

Can someone show a simple data model that compares both styles?

Thanks.


Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Vladimir Yudovin
querying them would be inefficient (impossible?
Impossible. In the case of multi-column partition key all of them must be 
restricted in WHERE clause:

CREATE TABLE data.table (id1 int, id2 int, primary KEY ((id1,id2)));
SELECT * FROM data.table WHERE id1 = 0;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Partition 
key parts: id2 must be restricted as other parts are"



Best regards, Vladimir Yudovin, 
Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Sun, 09 Oct 2016 00:27:12 -0400 Graham Sandersongra...@vast.com 
wrote  

No the employees would end up in arbitrary partitions, and querying them would 
be inefficient (impossible? - I am levels back on C* so don’t know if ALLOW 
FILTERING even works for this).

I would be tempted to use organization_id only or organization_Id and maybe a 
few shard bits (if you are worried about huge orgs) from the employee_Id to 
make the partition key, but it really depends what other queries you will be 
making
On Oct 8, 2016, at 11:19 PM, Ali Akhtar ali.rac...@gmail.com wrote:

In the case of PRIMARY KEY((organization_id, employee_id)), could I still do a 
query like Select ... where organization_id = x, to get all employees in a 
particular organization?

And, this will put all those employees in the same node, right?


On Sun, Oct 9, 2016 at 9:17 AM, Graham Sanderson gra...@vast.com wrote:
Nomenclature is tricky, but PRIMARY KEY((organization_id, employee_id)) will 
make organization_id, employee_id the partition key which equates roughly to 
your latter sentence (I’m not sure about the 4 billion limit - that may be the 
new actual limit, but probably not a good idea).
On Oct 8, 2016, at 8:35 PM, Ali Akhtar ali.rac...@gmail.com wrote:

the last '4 billion rows' should say '4 billion columns / cells'

On Sun, Oct 9, 2016 at 6:34 AM, Ali Akhtar ali.rac...@gmail.com wrote:
Say I have the following primary key:

PRIMARY KEY((organization_id, employee_id))


Will this create 1 row whose primary key is the organization id, but it has a 4 
billion column / cell limit?


Or will this create 1 row for each employee in the same organization, so if i 
have 5 employees, they will each have their own 5 rows, and each of those 5 
rows will have their own 4 billion rows?


Thank you.

 


 










 










Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Graham Sanderson
No the employees would end up in arbitrary partitions, and querying them would 
be inefficient (impossible? - I am levels back on C* so don’t know if ALLOW 
FILTERING even works for this).

I would be tempted to use organization_id only or organization_Id and maybe a 
few shard bits (if you are worried about huge orgs) from the employee_Id to 
make the partition key, but it really depends what other queries you will be 
making
> On Oct 8, 2016, at 11:19 PM, Ali Akhtar  wrote:
> 
> In the case of PRIMARY KEY((organization_id, employee_id)), could I still do 
> a query like Select ... where organization_id = x, to get all employees in a 
> particular organization?
> 
> And, this will put all those employees in the same node, right?
> 
> On Sun, Oct 9, 2016 at 9:17 AM, Graham Sanderson  > wrote:
> Nomenclature is tricky, but PRIMARY KEY((organization_id, employee_id)) will 
> make organization_id, employee_id the partition key which equates roughly to 
> your latter sentence (I’m not sure about the 4 billion limit - that may be 
> the new actual limit, but probably not a good idea).
> 
>> On Oct 8, 2016, at 8:35 PM, Ali Akhtar > > wrote:
>> 
>> the last '4 billion rows' should say '4 billion columns / cells'
>> 
>> On Sun, Oct 9, 2016 at 6:34 AM, Ali Akhtar > > wrote:
>> Say I have the following primary key:
>> PRIMARY KEY((organization_id, employee_id))
>> 
>> Will this create 1 row whose primary key is the organization id, but it has 
>> a 4 billion column / cell limit?
>> 
>> Or will this create 1 row for each employee in the same organization, so if 
>> i have 5 employees, they will each have their own 5 rows, and each of those 
>> 5 rows will have their own 4 billion rows?
>> 
>> Thank you.
>> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Ali Akhtar
In the case of PRIMARY KEY((organization_id, employee_id)), could I still
do a query like Select ... where organization_id = x, to get all employees
in a particular organization?

And, this will put all those employees in the same node, right?

On Sun, Oct 9, 2016 at 9:17 AM, Graham Sanderson  wrote:

> Nomenclature is tricky, but PRIMARY KEY((organization_id, employee_id))
> will make organization_id, employee_id the partition key which equates
> roughly to your latter sentence (I’m not sure about the 4 billion limit -
> that may be the new actual limit, but probably not a good idea).
>
> On Oct 8, 2016, at 8:35 PM, Ali Akhtar  wrote:
>
> the last '4 billion rows' should say '4 billion columns / cells'
>
> On Sun, Oct 9, 2016 at 6:34 AM, Ali Akhtar  wrote:
>
>> Say I have the following primary key:
>> PRIMARY KEY((organization_id, employee_id))
>>
>> Will this create 1 row whose primary key is the organization id, but it
>> has a 4 billion column / cell limit?
>>
>> Or will this create 1 row for each employee in the same organization, so
>> if i have 5 employees, they will each have their own 5 rows, and each of
>> those 5 rows will have their own 4 billion rows?
>>
>> Thank you.
>>
>
>
>


Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Graham Sanderson
Nomenclature is tricky, but PRIMARY KEY((organization_id, employee_id)) will 
make organization_id, employee_id the partition key which equates roughly to 
your latter sentence (I’m not sure about the 4 billion limit - that may be the 
new actual limit, but probably not a good idea).

> On Oct 8, 2016, at 8:35 PM, Ali Akhtar  wrote:
> 
> the last '4 billion rows' should say '4 billion columns / cells'
> 
> On Sun, Oct 9, 2016 at 6:34 AM, Ali Akhtar  > wrote:
> Say I have the following primary key:
> PRIMARY KEY((organization_id, employee_id))
> 
> Will this create 1 row whose primary key is the organization id, but it has a 
> 4 billion column / cell limit?
> 
> Or will this create 1 row for each employee in the same organization, so if i 
> have 5 employees, they will each have their own 5 rows, and each of those 5 
> rows will have their own 4 billion rows?
> 
> Thank you.
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Ali Akhtar
the last '4 billion rows' should say '4 billion columns / cells'

On Sun, Oct 9, 2016 at 6:34 AM, Ali Akhtar  wrote:

> Say I have the following primary key:
> PRIMARY KEY((organization_id, employee_id))
>
> Will this create 1 row whose primary key is the organization id, but it
> has a 4 billion column / cell limit?
>
> Or will this create 1 row for each employee in the same organization, so
> if i have 5 employees, they will each have their own 5 rows, and each of
> those 5 rows will have their own 4 billion rows?
>
> Thank you.
>


Do partition keys create skinny or wide rows?

2016-10-08 Thread Ali Akhtar
Say I have the following primary key:
PRIMARY KEY((organization_id, employee_id))

Will this create 1 row whose primary key is the organization id, but it has
a 4 billion column / cell limit?

Or will this create 1 row for each employee in the same organization, so if
i have 5 employees, they will each have their own 5 rows, and each of those
5 rows will have their own 4 billion rows?

Thank you.


Re: Partition Key - Wide rows?

2016-10-06 Thread Saladi Naidu
It depends on Partition/Primary key design. In order to execute all 3 queries, 
Partition Key is Org id and others are Clustering keys. if there are many org's 
it will be ok, but if it is one org then a single partition  will hold all the 
data and its not good Naidu Saladi 
 

On Thursday, October 6, 2016 12:14 PM, Ali Akhtar <ali.rac...@gmail.com> 
wrote:
 

 Thanks, Phil.
1- In my use-case, its probably okay to partition all the org data together. 
This is for a b2b enterprise SaaS application, the customers will be 
organizations.
So it is probably okay to store each org's data next to each other, right?
2- I'm thinking of having the primary key be: (org_id, team_id, project_id, 
issue_id). 
In the above case, will there be a skinny row per issue, or a wide row per org 
/ team / project?
3- Just to double check, with the above primary key, can I still query using 
just the org_id, org + team id, and org + team + project id?
4- If I wanted to refer to a particular issue, it looks like I'd need to send 
all 4 parameters. That may be problematic. Is there a better way of modeling 
this data?


On Thu, Oct 6, 2016 at 9:30 PM, Philip Persad <philip.per...@gmail.com> wrote:



1) No.  Your first 3 queries will work but not the last one (get issue by id).  
In Cassandra when you query you must include every preceding portion of the 
primary key.

2) 64 bytes (16 * 4), or somewhat more if storing as strings?  I don't think 
that's something I'd worry too much about.

3) Depends on how you build your partition key.  If partition key is (org id), 
then you get one partition per org (probably bad depending on your dataset).  
If partition key is (org id, team id, project id) then you will have one 
partition per project which is probably fine ( again, depending on your 
dataset).

Cheers,

-PhilFrom: Ali Akhtar
Sent: ‎2016-‎10-‎06 9:04 AM
To: user@cassandra.apache.org
Subject: Partition Key - Wide rows?

Heya,
I'm designing some tables, where data needs to be stored in the following 
hierarchy:
Organization -> Team -> Project -> Issues
I need to be able to retrieve issues:
- For the whole org - using org id- For a team (org id + team id)- For a 
project (org id + team id + project id)- If possible, by using just the issue id
I'm considering using all 4 ids as the primary key. The first 3 will use UUIDs, 
except issue id which will be an alphanumeric string, unique per project.
1) Will this setup allow using all 4 query scenarios?2) Will this make the 
primary key really long, 3 UUIDs + similar length'd issue id?3) Will this store 
issues as skinny rows, or wide rows? If an org has a lot of teams, which have a 
lot of projects, which have a lot of issues, etc, could I have issues w/ 
running out of the column limit of wide rows?4) Is there a better way of 
achieving this scenario?







   

Re: Partition Key - Wide rows?

2016-10-06 Thread Jonathan Haddad
>  In my use-case, its probably okay to partition all the org data together.


Maybe, maybe not.  Cassandra doesn't handle really big partitions very well
right now.  If you've got more than 100MB of data per org, you're better
off breaking it up (by project or team) and doing multiple queries to
stitch the data together client side.



On Thu, Oct 6, 2016 at 10:14 AM Ali Akhtar <ali.rac...@gmail.com> wrote:

> Thanks, Phil.
>
> 1- In my use-case, its probably okay to partition all the org data
> together. This is for a b2b enterprise SaaS application, the customers will
> be organizations.
>
> So it is probably okay to store each org's data next to each other, right?
>
> 2- I'm thinking of having the primary key be: (org_id, team_id,
> project_id, issue_id).
>
> In the above case, will there be a skinny row per issue, or a wide row per
> org / team / project?
>
> 3- Just to double check, with the above primary key, can I still query
> using just the org_id, org + team id, and org + team + project id?
>
> 4- If I wanted to refer to a particular issue, it looks like I'd need to
> send all 4 parameters. That may be problematic. Is there a better way of
> modeling this data?
>
>
>
> On Thu, Oct 6, 2016 at 9:30 PM, Philip Persad <philip.per...@gmail.com>
> wrote:
>
>
>
> 1) No.  Your first 3 queries will work but not the last one (get issue by
> id).  In Cassandra when you query you must include every preceding portion
> of the primary key.
>
> 2) 64 bytes (16 * 4), or somewhat more if storing as strings?  I don't
> think that's something I'd worry too much about.
>
> 3) Depends on how you build your partition key.  If partition key is (org
> id), then you get one partition per org (probably bad depending on your
> dataset).  If partition key is (org id, team id, project id) then you will
> have one partition per project which is probably fine ( again, depending on
> your dataset).
>
> Cheers,
>
> -Phil
> --
> From: Ali Akhtar <ali.rac...@gmail.com>
> Sent: ‎2016-‎10-‎06 9:04 AM
> To: user@cassandra.apache.org
> Subject: Partition Key - Wide rows?
>
> Heya,
>
> I'm designing some tables, where data needs to be stored in the following
> hierarchy:
>
> Organization -> Team -> Project -> Issues
>
> I need to be able to retrieve issues:
>
> - For the whole org - using org id
> - For a team (org id + team id)
> - For a project (org id + team id + project id)
> - If possible, by using just the issue id
>
> I'm considering using all 4 ids as the primary key. The first 3 will use
> UUIDs, except issue id which will be an alphanumeric string, unique per
> project.
>
> 1) Will this setup allow using all 4 query scenarios?
> 2) Will this make the primary key really long, 3 UUIDs + similar length'd
> issue id?
> 3) Will this store issues as skinny rows, or wide rows? If an org has a
> lot of teams, which have a lot of projects, which have a lot of issues,
> etc, could I have issues w/ running out of the column limit of wide rows?
> 4) Is there a better way of achieving this scenario?
>
>
>
>
>
>


Re: Partition Key - Wide rows?

2016-10-06 Thread Ali Akhtar
Thanks, Phil.

1- In my use-case, its probably okay to partition all the org data
together. This is for a b2b enterprise SaaS application, the customers will
be organizations.

So it is probably okay to store each org's data next to each other, right?

2- I'm thinking of having the primary key be: (org_id, team_id, project_id,
issue_id).

In the above case, will there be a skinny row per issue, or a wide row per
org / team / project?

3- Just to double check, with the above primary key, can I still query
using just the org_id, org + team id, and org + team + project id?

4- If I wanted to refer to a particular issue, it looks like I'd need to
send all 4 parameters. That may be problematic. Is there a better way of
modeling this data?



On Thu, Oct 6, 2016 at 9:30 PM, Philip Persad <philip.per...@gmail.com>
wrote:

>
>
> 1) No.  Your first 3 queries will work but not the last one (get issue by
> id).  In Cassandra when you query you must include every preceding portion
> of the primary key.
>
> 2) 64 bytes (16 * 4), or somewhat more if storing as strings?  I don't
> think that's something I'd worry too much about.
>
> 3) Depends on how you build your partition key.  If partition key is (org
> id), then you get one partition per org (probably bad depending on your
> dataset).  If partition key is (org id, team id, project id) then you will
> have one partition per project which is probably fine ( again, depending on
> your dataset).
>
> Cheers,
>
> -Phil
> --
> From: Ali Akhtar <ali.rac...@gmail.com>
> Sent: ‎2016-‎10-‎06 9:04 AM
> To: user@cassandra.apache.org
> Subject: Partition Key - Wide rows?
>
> Heya,
>
> I'm designing some tables, where data needs to be stored in the following
> hierarchy:
>
> Organization -> Team -> Project -> Issues
>
> I need to be able to retrieve issues:
>
> - For the whole org - using org id
> - For a team (org id + team id)
> - For a project (org id + team id + project id)
> - If possible, by using just the issue id
>
> I'm considering using all 4 ids as the primary key. The first 3 will use
> UUIDs, except issue id which will be an alphanumeric string, unique per
> project.
>
> 1) Will this setup allow using all 4 query scenarios?
> 2) Will this make the primary key really long, 3 UUIDs + similar length'd
> issue id?
> 3) Will this store issues as skinny rows, or wide rows? If an org has a
> lot of teams, which have a lot of projects, which have a lot of issues,
> etc, could I have issues w/ running out of the column limit of wide rows?
> 4) Is there a better way of achieving this scenario?
>
>
>
>
>


Partition Key - Wide rows?

2016-10-06 Thread Ali Akhtar
Heya,

I'm designing some tables, where data needs to be stored in the following
hierarchy:

Organization -> Team -> Project -> Issues

I need to be able to retrieve issues:

- For the whole org - using org id
- For a team (org id + team id)
- For a project (org id + team id + project id)
- If possible, by using just the issue id

I'm considering using all 4 ids as the primary key. The first 3 will use
UUIDs, except issue id which will be an alphanumeric string, unique per
project.

1) Will this setup allow using all 4 query scenarios?
2) Will this make the primary key really long, 3 UUIDs + similar length'd
issue id?
3) Will this store issues as skinny rows, or wide rows? If an org has a lot
of teams, which have a lot of projects, which have a lot of issues, etc,
could I have issues w/ running out of the column limit of wide rows?
4) Is there a better way of achieving this scenario?


Re: STCS Compaction with wide rows & TTL'd data

2016-09-02 Thread Kevin O'Connor
On Fri, Sep 2, 2016 at 9:33 AM, Mark Rose  wrote:

> Hi Kevin,
>
> The tombstones will live in an sstable until it gets compacted. Do you
> have a lot of pending compactions? If so, increasing the number of
> parallel compactors may help.


Nope, we are pretty well managed on compactions. Only ever 1 or 2 running
at a time per node.


> You may also be able to tun the STCS
> parameters. Here's a good explanation of how it works:
> https://shrikantbang.wordpress.com/2014/04/22/size-
> tiered-compaction-strategy-in-apache-cassandra/


Yeah interesting - I'd like to try that. Is there a way to verify what the
settings are before changing them? DESCRIBE TABLE doesn't seem to show the
compaction subproperties.


> Anyway, LCS would probably be a better fit for your use case. LCS
> would help with eliminating tombstones, but it may also result in
> dramatically higher CPU usage for compaction. If LCS compaction can
> keep up, in addition to getting ride of tombstones faster, LCS should
> reduce the number of sstables that must be read to return the row and
> have a positive impact on read latency. STCS is a bad fit for rows
> that are updated frequently (which includes rows with TTL'ed data).
>

Thanks - that may end up being where we go with this.

Also, you may have an error in your application design. OAuth Access
> Tokens are designed to have a very short lifetime of seconds or
> minutes. On access token expiry, a Refresh Token should be used to get
> a new access token. A long-lived access token is a dangerous thing as
> there is no way to disable it (refresh tokens should be disabled to
> prevent the creation of new access tokens).
>

Yeah, noted. We only allow longer lived access tokens in some very specific
scenarios, so they are much less likely to be in that CF than the standard
3600s ones, but they're there.


>
> -Mark
>
> On Thu, Sep 1, 2016 at 3:53 AM, Kevin O'Connor  wrote:
> > We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken
> and
> > one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> > key, and the columns are some data about the OAuth token. There's a TTL
> set
> > on it, usually 3600, but can be higher (up to 1 month).
> > OAuth2AccessTokensByUser has the user as the row key, and then all of the
> > user's token identifiers as column values. Each of the column values has
> a
> > TTL that is set to the same as the access token it corresponds to.
> >
> > The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> > OAuth2AccessTokensByUser CF takes around ~110 GB. If I use
> sstablemetadata,
> > I can see the droppable tombstones ratio is around 90% for the larger
> > sstables.
> >
> > My question is - why aren't these tombstones getting compacted away? I'm
> > guessing that it's because we use STCS and the large sstables that have
> > built up over time are never considered for compaction. Would LCS be a
> > better fit for the issue of trying to keep the tombstones in check?
> >
> > I've also tried forceUserDefinedCompaction via JMX on some of the largest
> > sstables and it just creates a new sstable of the exact same size, which
> was
> > pretty surprising. Why would this explicit request to compact an sstable
> not
> > remove tombstones?
> >
> > Thanks!
> >
> > Kevin
>


Re: STCS Compaction with wide rows & TTL'd data

2016-09-02 Thread Jonathan Haddad
Also, if you can get to at least 2.0 you can use
TimeWindowCompactionStrategy which works a lot better with time series data
w/ TTLs than STCS.

On Fri, Sep 2, 2016 at 9:53 AM Jonathan Haddad  wrote:

> What's your gc_grace_seconds set to?  Is it possible you have a lot of
> tombstones that haven't reached the GC grace time yet?
>
>
> On Thu, Sep 1, 2016 at 12:54 AM Kevin O'Connor  wrote:
>
>> We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken
>> and one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the
>> row key, and the columns are some data about the OAuth token. There's a TTL
>> set on it, usually 3600, but can be higher (up to 1 month).
>> OAuth2AccessTokensByUser has the user as the row key, and then all of the
>> user's token identifiers as column values. Each of the column values has a
>> TTL that is set to the same as the access token it corresponds to.
>>
>> The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
>> OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
>> I can see the droppable tombstones ratio is around 90% for the larger
>> sstables.
>>
>> My question is - why aren't these tombstones getting compacted away? I'm
>> guessing that it's because we use STCS and the large sstables that have
>> built up over time are never considered for compaction. Would LCS be a
>> better fit for the issue of trying to keep the tombstones in check?
>>
>> I've also tried forceUserDefinedCompaction via JMX on some of the largest
>> sstables and it just creates a new sstable of the exact same size, which
>> was pretty surprising. Why would this explicit request to compact an
>> sstable not remove tombstones?
>>
>> Thanks!
>>
>> Kevin
>>
>


Re: STCS Compaction with wide rows & TTL'd data

2016-09-02 Thread Jonathan Haddad
What's your gc_grace_seconds set to?  Is it possible you have a lot of
tombstones that haven't reached the GC grace time yet?

On Thu, Sep 1, 2016 at 12:54 AM Kevin O'Connor  wrote:

> We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
> one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> key, and the columns are some data about the OAuth token. There's a TTL set
> on it, usually 3600, but can be higher (up to 1 month).
> OAuth2AccessTokensByUser has the user as the row key, and then all of the
> user's token identifiers as column values. Each of the column values has a
> TTL that is set to the same as the access token it corresponds to.
>
> The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
> I can see the droppable tombstones ratio is around 90% for the larger
> sstables.
>
> My question is - why aren't these tombstones getting compacted away? I'm
> guessing that it's because we use STCS and the large sstables that have
> built up over time are never considered for compaction. Would LCS be a
> better fit for the issue of trying to keep the tombstones in check?
>
> I've also tried forceUserDefinedCompaction via JMX on some of the largest
> sstables and it just creates a new sstable of the exact same size, which
> was pretty surprising. Why would this explicit request to compact an
> sstable not remove tombstones?
>
> Thanks!
>
> Kevin
>


STCS Compaction with wide rows & TTL'd data

2016-09-01 Thread Kevin O'Connor
We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
key, and the columns are some data about the OAuth token. There's a TTL set
on it, usually 3600, but can be higher (up to 1 month).
OAuth2AccessTokensByUser has the user as the row key, and then all of the
user's token identifiers as column values. Each of the column values has a
TTL that is set to the same as the access token it corresponds to.

The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
I can see the droppable tombstones ratio is around 90% for the larger
sstables.

My question is - why aren't these tombstones getting compacted away? I'm
guessing that it's because we use STCS and the large sstables that have
built up over time are never considered for compaction. Would LCS be a
better fit for the issue of trying to keep the tombstones in check?

I've also tried forceUserDefinedCompaction via JMX on some of the largest
sstables and it just creates a new sstable of the exact same size, which
was pretty surprising. Why would this explicit request to compact an
sstable not remove tombstones?

Thanks!

Kevin


Performance impact of wide rows on read heavy workload

2016-07-21 Thread Bhuvan Rawal
Hi,

We are trying to evaluate read performance impact of having a wide row by
pushing a partition out into clustering column. From all the information I
could gather[1]

 [2]

 [3]  Key Cache as well
as Partition Index point to Block Location of partition on the disk.

In case if we have a schema like below which would result in a wide table
if pk is of high cardinality (Say Month in a time series data):

CREATE TABLE ks.wide_row_table (
pk int,
ck1 bigint,
ck2 text,
v1 text,
v2 text,
v3 bigint,
PRIMARY KEY (pk, ck1, ck2)
);

Suppose that a there is only one SSTable for this table at this instance
and specific partition has reached 100MB will reading the first row by
specifying first 0th row in the partition same as the last row in the
partition (At 100 MB).

In other words is there any heuristic to determine the disk offset by
clustering column after partition key is specified to locate to the block
in the disk or in the 2nd case complete 100MB partition will have to be
scanned in order to figure out the relevant row. For simplicity sake lets
assume that Row cache & OS page cache is disabled and all reads are hitting
disk.

Thanks & Regards,
Bhuvan


Re: Efficient Paging Option in Wide Rows

2016-04-24 Thread Clint Martin
I tend to agree with Carlos. Having multiple row keys and parallelizing
your queries will tend to result in faster responses.  Keeping positions
relatively small will also help your cluster to manage your data more
efficiently also resulting in better performance.

One thing I would recommend is to denormalise your tables. Rather than
having an index table, just store a copy of your data. That way instead of
reading a bunch of indexes into a main table and then having to read each
record from the main table, you can just read the data you are after all at
once.

This trades disk storage space for performance. So you will need to
calculate the benefit of speed vs the cost of additional storage.

Clint
On Apr 24, 2016 1:44 PM, "Carlos Alonso"  wrote:

> Hi Anuj,
>
> That's a very good question and I'd like to hear an answer from anyone who
> can give a detailed answer, but in the mean time I'll try to give my two
> cents.
>
> First of all I think I'd rather split all the values into different
> partition keys for two reasons:
> 1.- If you're sure you're accessing all data at the same time you'll be
> able to parallelize the queries by hitting more nodes on your cluster
> rather than creating a hotspot on the owner(s) of the data.
> 2.- It is a recommended good practice to keep partitions small enough.
> Check if your partition would fit in the good practice by applying the
> formulae from this video:
> https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 23 April 2016 at 20:25, Anuj Wadehra  wrote:
>
>> Hi,
>>
>> Can anyone take this question?
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Sat, 23 Apr, 2016 at 2:30 PM, Anuj Wadehra
>>  wrote:
>> I think I complicated the question..so I am trying to put the question
>> crisply..
>>
>> We have a table defined with clustering key/column. We have  5
>> different clustering key values.
>>
>> If we want to fetch all 5 rowd,Which query option would be faster and
>> why?
>>
>> 1. Given a single primary key/partition key with 5 clustering
>> keys..we will page through the single partition 500 records at a time.Thus,
>> we will do 5/500=100 db hits but for same partition key.
>>
>> 2. Given 100 different primary keys with each primary key having just 500
>> clustering key columns. Here also we will need 100 db hits but for
>> different partitions.
>>
>>
>> Basically I want to understand any optimizations built into CQL/Cassandra
>> which make paging through a single partition more efficient than querying
>> data from different partitions.
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra
>>  wrote:
>> Hi,
>>
>> I have a wide row index table so that I can fetch all row keys
>> corresponding to a column value.
>>
>> Row of index_table will look like:
>>
>> ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn
>> ..
>> ColValue1:bucketn>> rowkey1, rowkey2.. rowkeyn
>>
>> We will have buckets to avoid hotspots. Row keys of main table are random
>> numbers and we will never do column slice like:
>>
>> Select * from index_table where key=xxx and
>> Col > rowkey1 and col < rowkey10
>>
>> Also, we will ALWAYS fetch all data for a given value of index column.
>> Thus all buckets havr to be read.
>>
>> Each index column value can map to thousands-millions of row keys in main
>> table.
>>
>> Based on our use case, there are two design choices in front of me:
>>
>> 1. Have large number of buckets/rows for an index column value and have
>> lesser data ( around few thousands) in each row.
>>
>> Thus, every time we want to fetch all row keys for an index col value, we
>> will query more rows and for each row we will have to page through data 500
>> records at a time.
>>
>> 2. Have fewer buckets/rows for an index column value.
>>
>> Every time we want to fetch all row keys for an index col value, we will
>> query data less numner of wider rows and then page through each wide row
>> reading 500 columns at a time.
>>
>>
>> Which approach is more efficient?
>>
>>  Approach1: More number of rows with less data in each row.
>>
>>
>> OR
>>
>> Approach 2: less number of  rows with more data in each row
>>
>>
>> Either ways,  we are fetching only 500 records at a time in a query. Even
>> in approach 2 (wider rows) , we can query only small data of 500 at a time.
>>
>>
>> Thanks
>> Anuj
>>
>>
>>
>>
>>
>>
>


Re: Efficient Paging Option in Wide Rows

2016-04-24 Thread Carlos Alonso
Hi Anuj,

That's a very good question and I'd like to hear an answer from anyone who
can give a detailed answer, but in the mean time I'll try to give my two
cents.

First of all I think I'd rather split all the values into different
partition keys for two reasons:
1.- If you're sure you're accessing all data at the same time you'll be
able to parallelize the queries by hitting more nodes on your cluster
rather than creating a hotspot on the owner(s) of the data.
2.- It is a recommended good practice to keep partitions small enough.
Check if your partition would fit in the good practice by applying the
formulae from this video:
https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size

Cheers!

Carlos Alonso | Software Engineer | @calonso 

On 23 April 2016 at 20:25, Anuj Wadehra  wrote:

> Hi,
>
> Can anyone take this question?
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Sat, 23 Apr, 2016 at 2:30 PM, Anuj Wadehra
>  wrote:
> I think I complicated the question..so I am trying to put the question
> crisply..
>
> We have a table defined with clustering key/column. We have  5
> different clustering key values.
>
> If we want to fetch all 5 rowd,Which query option would be faster and
> why?
>
> 1. Given a single primary key/partition key with 5 clustering keys..we
> will page through the single partition 500 records at a time.Thus, we will
> do 5/500=100 db hits but for same partition key.
>
> 2. Given 100 different primary keys with each primary key having just 500
> clustering key columns. Here also we will need 100 db hits but for
> different partitions.
>
>
> Basically I want to understand any optimizations built into CQL/Cassandra
> which make paging through a single partition more efficient than querying
> data from different partitions.
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra
>  wrote:
> Hi,
>
> I have a wide row index table so that I can fetch all row keys
> corresponding to a column value.
>
> Row of index_table will look like:
>
> ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn
> ..
> ColValue1:bucketn>> rowkey1, rowkey2.. rowkeyn
>
> We will have buckets to avoid hotspots. Row keys of main table are random
> numbers and we will never do column slice like:
>
> Select * from index_table where key=xxx and
> Col > rowkey1 and col < rowkey10
>
> Also, we will ALWAYS fetch all data for a given value of index column.
> Thus all buckets havr to be read.
>
> Each index column value can map to thousands-millions of row keys in main
> table.
>
> Based on our use case, there are two design choices in front of me:
>
> 1. Have large number of buckets/rows for an index column value and have
> lesser data ( around few thousands) in each row.
>
> Thus, every time we want to fetch all row keys for an index col value, we
> will query more rows and for each row we will have to page through data 500
> records at a time.
>
> 2. Have fewer buckets/rows for an index column value.
>
> Every time we want to fetch all row keys for an index col value, we will
> query data less numner of wider rows and then page through each wide row
> reading 500 columns at a time.
>
>
> Which approach is more efficient?
>
>  Approach1: More number of rows with less data in each row.
>
>
> OR
>
> Approach 2: less number of  rows with more data in each row
>
>
> Either ways,  we are fetching only 500 records at a time in a query. Even
> in approach 2 (wider rows) , we can query only small data of 500 at a time.
>
>
> Thanks
> Anuj
>
>
>
>
>
>


Re: Efficient Paging Option in Wide Rows

2016-04-23 Thread Anuj Wadehra
Hi,
Can anyone take this question?
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Sat, 23 Apr, 2016 at 2:30 PM, Anuj Wadehra wrote:  
 I think I complicated the question..so I am trying to put the question 
crisply..
We have a table defined with clustering key/column. We have  5 different 
clustering key values. 
If we want to fetch all 5 rowd,Which query option would be faster and why?
1. Given a single primary key/partition key with 5 clustering keys..we will 
page through the single partition 500 records at a time.Thus, we will do 
5/500=100 db hits but for same partition key.
2. Given 100 different primary keys with each primary key having just 500 
clustering key columns. Here also we will need 100 db hits but for different 
partitions.

Basically I want to understand any optimizations built into CQL/Cassandra which 
make paging through a single partition more efficient than querying data from 
different partitions.

ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra wrote:  
 Hi,
I have a wide row index table so that I can fetch all row keys corresponding to 
a column value. 
Row of index_table will look like:
ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn..ColValue1:bucketn>> 
rowkey1, rowkey2.. rowkeyn
We will have buckets to avoid hotspots. Row keys of main table are random 
numbers and we will never do column slice like:

Select * from index_table where key=xxx and Col > rowkey1 and col < rowkey10
Also, we will ALWAYS fetch all data for a given value of index column. Thus all 
buckets havr to be read.
Each index column value can map to thousands-millions of row keys in main table.
Based on our use case, there are two design choices in front of me:
1. Have large number of buckets/rows for an index column value and have lesser 
data ( around few thousands) in each row.
Thus, every time we want to fetch all row keys for an index col value, we will 
query more rows and for each row we will have to page through data 500 records 
at a time.
2. Have fewer buckets/rows for an index column value.
Every time we want to fetch all row keys for an index col value, we will query 
data less numner of wider rows and then page through each wide row reading 500 
columns at a time.

Which approach is more efficient?
 Approach1: More number of rows with less data in each row.

OR
Approach 2: less number of  rows with more data in each row

Either ways,  we are fetching only 500 records at a time in a query. Even in 
approach 2 (wider rows) , we can query only small data of 500 at a time.

ThanksAnuj




  
  


Re: Efficient Paging Option in Wide Rows

2016-04-23 Thread Anuj Wadehra
I think I complicated the question..so I am trying to put the question crisply..
We have a table defined with clustering key/column. We have  5 different 
clustering key values. 
If we want to fetch all 5 rowd,Which query option would be faster and why?
1. Given a single primary key/partition key with 5 clustering keys..we will 
page through the single partition 500 records at a time.Thus, we will do 
5/500=100 db hits but for same partition key.
2. Given 100 different primary keys with each primary key having just 500 
clustering key columns. Here also we will need 100 db hits but for different 
partitions.

Basically I want to understand any optimizations built into CQL/Cassandra which 
make paging through a single partition more efficient than querying data from 
different partitions.

ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra wrote:  
 Hi,
I have a wide row index table so that I can fetch all row keys corresponding to 
a column value. 
Row of index_table will look like:
ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn..ColValue1:bucketn>> 
rowkey1, rowkey2.. rowkeyn
We will have buckets to avoid hotspots. Row keys of main table are random 
numbers and we will never do column slice like:

Select * from index_table where key=xxx and Col > rowkey1 and col < rowkey10
Also, we will ALWAYS fetch all data for a given value of index column. Thus all 
buckets havr to be read.
Each index column value can map to thousands-millions of row keys in main table.
Based on our use case, there are two design choices in front of me:
1. Have large number of buckets/rows for an index column value and have lesser 
data ( around few thousands) in each row.
Thus, every time we want to fetch all row keys for an index col value, we will 
query more rows and for each row we will have to page through data 500 records 
at a time.
2. Have fewer buckets/rows for an index column value.
Every time we want to fetch all row keys for an index col value, we will query 
data less numner of wider rows and then page through each wide row reading 500 
columns at a time.

Which approach is more efficient?
 Approach1: More number of rows with less data in each row.

OR
Approach 2: less number of  rows with more data in each row

Either ways,  we are fetching only 500 records at a time in a query. Even in 
approach 2 (wider rows) , we can query only small data of 500 at a time.

ThanksAnuj




  


Efficient Paging Option in Wide Rows

2016-04-22 Thread Anuj Wadehra
Hi,
I have a wide row index table so that I can fetch all row keys corresponding to 
a column value. 
Row of index_table will look like:
ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn..ColValue1:bucketn>> 
rowkey1, rowkey2.. rowkeyn
We will have buckets to avoid hotspots. Row keys of main table are random 
numbers and we will never do column slice like:

Select * from index_table where key=xxx and Col > rowkey1 and col < rowkey10
Also, we will ALWAYS fetch all data for a given value of index column. Thus all 
buckets havr to be read.
Each index column value can map to thousands-millions of row keys in main table.
Based on our use case, there are two design choices in front of me:
1. Have large number of buckets/rows for an index column value and have lesser 
data ( around few thousands) in each row.
Thus, every time we want to fetch all row keys for an index col value, we will 
query more rows and for each row we will have to page through data 500 records 
at a time.
2. Have fewer buckets/rows for an index column value.
Every time we want to fetch all row keys for an index col value, we will query 
data less numner of wider rows and then page through each wide row reading 500 
columns at a time.

Which approach is more efficient?
 Approach1: More number of rows with less data in each row.

OR
Approach 2: less number of  rows with more data in each row

Either ways,  we are fetching only 500 records at a time in a query. Even in 
approach 2 (wider rows) , we can query only small data of 500 at a time.

ThanksAnuj






Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-14 Thread Carlos Alonso
Hi.

+1 to this @Jack's sentence 'Generally, Cassandra is ideal for only two use
cases (access patterns really): 1) retrieval by a specific key, and 2)
retrieval of a relatively narrow slice of contiguous data, beginning with a
specific key.'

So I think you're modelling it properly (to have fairly narrow rows). I
think you can then store in another table the initial bucket for a sensor
and either don't have the end one (taking advantage that Cassandra is very
quick at finding empty partitions) and query until today. Or, given that
your bucketing is per week, only update the 'last partition' entry for a
sensor if we're really one week after the latest saved. That will generate
one single tombstone per sensor and that doesn't sound scary I think.

On the other hand. Did you considered offloading the historical data to a
better data warehouse?

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 12 March 2016 at 16:59, Jack Krupansky <jack.krupan...@gmail.com> wrote:

> Generally, secondary indexes are not recommended in Cassandra. Query
> tables and/or materialized views are the recommended alternative. But it
> all depends on the specific nature of the queries and the cardinality of
> the data.
>
> Generally, Cassandra is ideal for only two use cases (access patterns
> really): 1) retrieval by a specific key, and 2) retrieval of a relatively
> narrow slice of contiguous data, beginning with a specific key.
>
> Bulk retrieval is not a great access pattern for Cassandra. The emphasis
> is on being a database (that's why CQL is so similar to SQL) rather than a
> raw data store.
>
> Sure, technically you can do bulk retrieval, but essentially that requires
> modeling and accessing using relatively narrow slices.
>
> Closing the circle, Cassandra is always enhancing its capabilities and
> there is indeed that effort underway to support wider rows, but the
> emphasis of modeling still needs to be centered on point queries and narrow
> contiguous slices.
>
> Even with Spark and analytics that may indeed need to do a full scan of a
> large amount of data, the model needs to be that the big scan is done in
> small chunks.
>
>
> -- Jack Krupansky
>
> On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
>> Our analytics currently pulls in all the data for a single sensor reading
>> as we use it in its entirety during signal processing. We may add secondary
>> indices to the table in the future to pull in broadly classified data, but
>> right now, our only goal is this bulk retrieval.
>>
>> --
>> *From:* Jack Krupansky <jack.krupan...@gmail.com>
>> *To:* user@cassandra.apache.org
>> *Sent:* Friday, March 11, 2016 7:25 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Thanks, that level of query detail gives us a better picture to focus on.
>> I think through this some more over the weekend.
>>
>> Also, these queries focus on raw, bulk retrieval of sensor data readings,
>> but do you have reading-based queries, such as range of an actual sensor
>> reading?
>>
>> -- Jack Krupansky
>>
>> On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com>
>> wrote:
>>
>> The 5000 readings mentioned would be against a single sensor on a single
>> sensor unit.
>>
>> The scope of the queries on this table is intended to be fairly simple.
>> Here are some example queries, without 'sharding', that we would perform on
>> this table:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
>> ORDER BY time DESC LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
>> ORDER BY time LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
>> classification=?
>> ORDER BY time DESC LIMIT 5000
>>
>> where 'classification' is secondary index that we expect to add.
>>
>> In some cases, we have to revisit all values too so a complete table scan
>> is needed:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>>
>> Getting the "next" and "previous" 5000 readings is also something we do,
>> but is manageable from our standpoint as we can look at th

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-12 Thread Jack Krupansky
Generally, secondary indexes are not recommended in Cassandra. Query tables
and/or materialized views are the recommended alternative. But it all
depends on the specific nature of the queries and the cardinality of the
data.

Generally, Cassandra is ideal for only two use cases (access patterns
really): 1) retrieval by a specific key, and 2) retrieval of a relatively
narrow slice of contiguous data, beginning with a specific key.

Bulk retrieval is not a great access pattern for Cassandra. The emphasis is
on being a database (that's why CQL is so similar to SQL) rather than a raw
data store.

Sure, technically you can do bulk retrieval, but essentially that requires
modeling and accessing using relatively narrow slices.

Closing the circle, Cassandra is always enhancing its capabilities and
there is indeed that effort underway to support wider rows, but the
emphasis of modeling still needs to be centered on point queries and narrow
contiguous slices.

Even with Spark and analytics that may indeed need to do a full scan of a
large amount of data, the model needs to be that the big scan is done in
small chunks.


-- Jack Krupansky

On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania <jason.ka...@ymail.com> wrote:

> Our analytics currently pulls in all the data for a single sensor reading
> as we use it in its entirety during signal processing. We may add secondary
> indices to the table in the future to pull in broadly classified data, but
> right now, our only goal is this bulk retrieval.
>
> --
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org
> *Sent:* Friday, March 11, 2016 7:25 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks, that level of query detail gives us a better picture to focus on.
> I think through this some more over the weekend.
>
> Also, these queries focus on raw, bulk retrieval of sensor data readings,
> but do you have reading-based queries, such as range of an actual sensor
> reading?
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> The 5000 readings mentioned would be against a single sensor on a single
> sensor unit.
>
> The scope of the queries on this table is intended to be fairly simple.
> Here are some example queries, without 'sharding', that we would perform on
> this table:
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
> ORDER BY time DESC LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
> ORDER BY time LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
> classification=?
> ORDER BY time DESC LIMIT 5000
>
> where 'classification' is secondary index that we expect to add.
>
> In some cases, we have to revisit all values too so a complete table scan
> is needed:
>
> SELECT "time","readings" FROM "sensorReadings"
>
> Getting the "next" and "previous" 5000 readings is also something we do,
> but is manageable from our standpoint as we can look at the range-end
> timestamps that are returned and use those in the subsequent queries.
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
> ORDER BY time LIMIT 5000
>
> Splitting the bulk content out of the main table is something we
> considered too but we didn't find any detail on whether that would solve
> our timeout problem. If there is a reference for using this approach, it
> would be of interest to us to avoid any assumptions on how we would
> approach it.
>
> A question: Is the probability of a timeout directly linked to a longer
> seek time in reading through a partition's contents? If that is the case,
> splitting the partition keys into a separate table would be straightforward.
>
> Regards,
>
> Jason
>
> --
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Friday, March 11, 2016 6:22 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
&g

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-12 Thread Jason Kania
Our analytics currently pulls in all the data for a single sensor reading as we 
use it in its entirety during signal processing. We may add secondary indices 
to the table in the future to pull in broadly classified data, but right now, 
our only goal is this bulk retrieval.
  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org 
 Sent: Friday, March 11, 2016 7:25 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
  
Thanks, that level of query detail gives us a better picture to focus on. I 
think through this some more over the weekend.
Also, these queries focus on raw, bulk retrieval of sensor data readings, but 
do you have reading-based queries, such as range of an actual sensor reading?
-- Jack Krupansky
On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com> wrote:

The 5000 readings mentioned would be against a single sensor on a single sensor 
unit.

The scope of the queries on this table is intended to be fairly simple. Here 
are some example queries, without 'sharding', that we would perform on this 
table:

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=?ORDER BY time DESC LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=?ORDER BY time LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=? AND classification=?
ORDER BY time DESC LIMIT 5000
where 'classification' is secondary index that we expect to add.

In some cases, we have to revisit all values too so a complete table scan is 
needed:
SELECT "time","readings" FROM "sensorReadings"
Getting the "next" and "previous" 5000 readings is also something we do, but is 
manageable from our standpoint as we can look at the range-end timestamps that 
are returned and use those in the subsequent queries.

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=? AND time<=?ORDER BY time LIMIT 5000
Splitting the bulk content out of the main table is something we considered too 
but we didn't find any detail on whether that would solve our timeout problem. 
If there is a reference for using this approach, it would be of interest to us 
to avoid any assumptions on how we would approach it.

A question: Is the probability of a timeout directly linked to a longer seek 
time in reading through a partition's contents? If that is the case, splitting 
the partition keys into a separate table would be straightforward.

Regards,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Friday, March 11, 2016 6:22 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
Thanks for the additional information, but there is still not enough color on 
the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all 
sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and 
last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table 
and then simply store the PK of that data in your time series. That way you can 
have a much wider row of time series (number of rows) without hitting a bulk 
size issue for the partition. But... I don't want to jump to solutions until we 
have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 1 sensor units and each sensor unit 
has 32 sensors. We are seeing about 700 events per day per sensor, each 
providing about 2K of data. Based on keeping each partition to about 10 Mb 
(based on readings we saw on performance), we chose to break our partitions on 
a weekly basis. This is possibly finer than we need as we were seeing timeouts 
only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for 
our analysis and will use only the sensorUnitId and sensorId to uniquely 
identify the data source with the timeShard value used to break up our 
partitions. We have handling to sequentially scan based on our "timeShard" 
value, but don't have a good handle on the determination of the "timeShard" 
portion of the partition key at read time. The data starts coming in when a 
subscriber

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-12 Thread Jason Kania
Hi Carlos,
Thanks for the suggestions.
We are having partition size issues and that was why we started to do custom 
sharding/partition division based on time. As you mentioned, we are having 
problems with identification. Its the identification of shard range that we 
need to understand and our data doesn't necessarily run until the current time. 
My worry with storing that last shard id in another table is that we would 
update the same row in that table all the time creating tombstones.
It is good to know that returning empty partitions is not that costly as that 
is a concern when we don't know where to start and end.
Thanks,
Jason


  From: Carlos Alonso <i...@mrcalonso.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 7:24 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
  
Hi Jason,
If I understand correctly you have no problems with the size of your partitions 
or transactional queries but with the 'identification' of them when having to 
do analytical queries.
I'd then suggest two options:1. Keep using Cassandra and store the first 
'bucket' of each sensor in a separate table to use as the starting point of 
your full scan queries. Then issue async queries incrementing the bucket until 
today (logical end of the data). Cassandra is very efficient at returning empty 
partitions, so querying on empty buckets is normally fine.
2. Periodically offload your 'historic' data to another storage more 
appropriate for analytics (Parquet + S3) and query it using Spark.
Hope it helps
On Saturday, 12 March 2016, Jack Krupansky <jack.krupan...@gmail.com> wrote:

Thanks for the additional information, but there is still not enough color on 
the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all 
sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and 
last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table 
and then simply store the PK of that data in your time series. That way you can 
have a much wider row of time series (number of rows) without hitting a bulk 
size issue for the partition. But... I don't want to jump to solutions until we 
have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 1 sensor units and each sensor unit 
has 32 sensors. We are seeing about 700 events per day per sensor, each 
providing about 2K of data. Based on keeping each partition to about 10 Mb 
(based on readings we saw on performance), we chose to break our partitions on 
a weekly basis. This is possibly finer than we need as we were seeing timeouts 
only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for 
our analysis and will use only the sensorUnitId and sensorId to uniquely 
identify the data source with the timeShard value used to break up our 
partitions. We have handling to sequentially scan based on our "timeShard" 
value, but don't have a good handle on the determination of the "timeShard" 
portion of the partition key at read time. The data starts coming in when a 
subscriber starts using our system and finishes when they discontinue service 
or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, 
it is with respect to storing the maximum and minimum timeShard values in 
another table for subsequent lookup or the cost of running the current 
implementation of SELECT DISTINCT. We need to run queries such as getting the 
first or last 5000 sensor readings when we don't know the time frame at which 
they occurred so cannot directly supply the timeShard portion of our partition 
key.

I appreciate your input,
Thanks,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
I'll stay away from advising on a specific schema per se, but I'll stick to the 
advice that you need to make sure that your queries are depending solely on the 
columns of the primary key or relatively short slices/scans, rather than run 
the risk of very long scans or having to process multiple partitions for a 
single query. That's canned to some extent, but still essential.
Of course we generally wish to avoid hotspots, but with time series they are 
unavoidable. I mean, sure you could place successive events at separate 
partitions, but then you can't do any kin

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
Thanks, that level of query detail gives us a better picture to focus on. I
think through this some more over the weekend.

Also, these queries focus on raw, bulk retrieval of sensor data readings,
but do you have reading-based queries, such as range of an actual sensor
reading?

-- Jack Krupansky

On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com> wrote:

> The 5000 readings mentioned would be against a single sensor on a single
> sensor unit.
>
> The scope of the queries on this table is intended to be fairly simple.
> Here are some example queries, without 'sharding', that we would perform on
> this table:
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
> ORDER BY time DESC LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
> ORDER BY time LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
> classification=?
> ORDER BY time DESC LIMIT 5000
>
> where 'classification' is secondary index that we expect to add.
>
> In some cases, we have to revisit all values too so a complete table scan
> is needed:
>
> SELECT "time","readings" FROM "sensorReadings"
>
> Getting the "next" and "previous" 5000 readings is also something we do,
> but is manageable from our standpoint as we can look at the range-end
> timestamps that are returned and use those in the subsequent queries.
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
> ORDER BY time LIMIT 5000
>
> Splitting the bulk content out of the main table is something we
> considered too but we didn't find any detail on whether that would solve
> our timeout problem. If there is a reference for using this approach, it
> would be of interest to us to avoid any assumptions on how we would
> approach it.
>
> A question: Is the probability of a timeout directly linked to a longer
> seek time in reading through a partition's contents? If that is the case,
> splitting the partition keys into a separate table would be straightforward.
>
> Regards,
>
> Jason
>
> --
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Friday, March 11, 2016 6:22 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 1 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system a

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Carlos Alonso
Hi Jason,

If I understand correctly you have no problems with the size of your
partitions or transactional queries but with the 'identification' of them
when having to do analytical queries.

I'd then suggest two options:
1. Keep using Cassandra and store the first 'bucket' of each sensor in a
separate table to use as the starting point of your full scan queries. Then
issue async queries incrementing the bucket until today (logical end of the
data). Cassandra is very efficient at returning empty partitions, so
querying on empty buckets is normally fine.

2. Periodically offload your 'historic' data to another storage more
appropriate for analytics (Parquet + S3) and query it using Spark.

Hope it helps

On Saturday, 12 March 2016, Jack Krupansky <jack.krupan...@gmail.com> wrote:

> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com
> <javascript:_e(%7B%7D,'cvml','jason.ka...@ymail.com');>> wrote:
>
>> Jack,
>>
>> Thanks for the response.
>>
>> We are targeting our database design to 1 sensor units and each
>> sensor unit has 32 sensors. We are seeing about 700 events per day per
>> sensor, each providing about 2K of data. Based on keeping each partition to
>> about 10 Mb (based on readings we saw on performance), we chose to break
>> our partitions on a weekly basis. This is possibly finer than we need as we
>> were seeing timeouts only once a single partition was about 150Mb in size
>>
>> When pulling in data, we will typically need to pull 1 to 4 months of
>> data for our analysis and will use only the sensorUnitId and sensorId to
>> uniquely identify the data source with the timeShard value used to break up
>> our partitions. We have handling to sequentially scan based on our
>> "timeShard" value, but don't have a good handle on the determination of the
>> "timeShard" portion of the partition key at read time. The data starts
>> coming in when a subscriber starts using our system and finishes when they
>> discontinue service or put the service on hold temporarily.
>>
>> When I talk about hotspots, it isn't the time series data that is the
>> concern, it is with respect to storing the maximum and minimum timeShard
>> values in another table for subsequent lookup or the cost of running the
>> current implementation of SELECT DISTINCT. We need to run queries such as
>> getting the first or last 5000 sensor readings when we don't know the time
>> frame at which they occurred so cannot directly supply the timeShard
>> portion of our partition key.
>>
>> I appreciate your input,
>>
>> Thanks,
>>
>> Jason
>>
>> --
>> *From:* Jack Krupansky <jack.krupan...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>>
>> *To:* "user@cassandra.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" <
>> user@cassandra.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>>
>> *Sent:* Friday, March 11, 2016 4:45 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> I'll stay away from advising on a specific schema per se, but I'll stick
>> to the advice that you need to make sure that your queries are depending
>> solely on the columns of the primary key or relatively short slices/scans,
>> rather than run the risk of very long scans or having to process multiple
>> partitions for a single query. That's canned to some extent, but still
>> essential.
>>
>> Of course we generally wish to avoid hotspots, but with time series they
>> are unavoidable. I mean, sure you could place successive events at separate
>> partitions, but then you can't do any kind of scanning/slicing.
>>
>> But, eve

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jason Kania
The 5000 readings mentioned would be against a single sensor on a single sensor 
unit.

The scope of the queries on this table is intended to be fairly simple. Here 
are some example queries, without 'sharding', that we would perform on this 
table:

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=?ORDER BY time DESC LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=?ORDER BY time LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=? AND classification=?
ORDER BY time DESC LIMIT 5000
where 'classification' is secondary index that we expect to add.

In some cases, we have to revisit all values too so a complete table scan is 
needed:
SELECT "time","readings" FROM "sensorReadings"
Getting the "next" and "previous" 5000 readings is also something we do, but is 
manageable from our standpoint as we can look at the range-end timestamps that 
are returned and use those in the subsequent queries.

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=? AND time<=?ORDER BY time LIMIT 5000
Splitting the bulk content out of the main table is something we considered too 
but we didn't find any detail on whether that would solve our timeout problem. 
If there is a reference for using this approach, it would be of interest to us 
to avoid any assumptions on how we would approach it.

A question: Is the probability of a timeout directly linked to a longer seek 
time in reading through a partition's contents? If that is the case, splitting 
the partition keys into a separate table would be straightforward.

Regards,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Friday, March 11, 2016 6:22 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
Thanks for the additional information, but there is still not enough color on 
the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all 
sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and 
last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table 
and then simply store the PK of that data in your time series. That way you can 
have a much wider row of time series (number of rows) without hitting a bulk 
size issue for the partition. But... I don't want to jump to solutions until we 
have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 1 sensor units and each sensor unit 
has 32 sensors. We are seeing about 700 events per day per sensor, each 
providing about 2K of data. Based on keeping each partition to about 10 Mb 
(based on readings we saw on performance), we chose to break our partitions on 
a weekly basis. This is possibly finer than we need as we were seeing timeouts 
only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for 
our analysis and will use only the sensorUnitId and sensorId to uniquely 
identify the data source with the timeShard value used to break up our 
partitions. We have handling to sequentially scan based on our "timeShard" 
value, but don't have a good handle on the determination of the "timeShard" 
portion of the partition key at read time. The data starts coming in when a 
subscriber starts using our system and finishes when they discontinue service 
or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, 
it is with respect to storing the maximum and minimum timeShard values in 
another table for subsequent lookup or the cost of running the current 
implementation of SELECT DISTINCT. We need to run queries such as getting the 
first or last 5000 sensor readings when we don't know the time frame at which 
they occurred so cannot directly supply the timeShard portion of our partition 
key.

I appreciate your input,
Thanks,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
Thanks for the additional information, but there is still not enough color
on the queries and too much focus on a premature data model.

Is this 5000 readings for a single sensor of a single sensor unit, or for
all sensors of a specified unit, or... both?

I presume you want "next" and "previous" 5000 readings as well as first and
last, but... you will have to confirm that.

One technique is to store the bulk of your raw sensor data in a separate
table and then simply store the PK of that data in your time series. That
way you can have a much wider row of time series (number of rows) without
hitting a bulk size issue for the partition. But... I don't want to jump to
solutions until we have a firmer handle on the query side of the fence.

-- Jack Krupansky

On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote:

> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 1 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system and finishes when they
> discontinue service or put the service on hold temporarily.
>
> When I talk about hotspots, it isn't the time series data that is the
> concern, it is with respect to storing the maximum and minimum timeShard
> values in another table for subsequent lookup or the cost of running the
> current implementation of SELECT DISTINCT. We need to run queries such as
> getting the first or last 5000 sensor readings when we don't know the time
> frame at which they occurred so cannot directly supply the timeShard
> portion of our partition key.
>
> I appreciate your input,
>
> Thanks,
>
> Jason
>
> --
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* "user@cassandra.apache.org" <user@cassandra.apache.org>
> *Sent:* Friday, March 11, 2016 4:45 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> I'll stay away from advising on a specific schema per se, but I'll stick
> to the advice that you need to make sure that your queries are depending
> solely on the columns of the primary key or relatively short slices/scans,
> rather than run the risk of very long scans or having to process multiple
> partitions for a single query. That's canned to some extent, but still
> essential.
>
> Of course we generally wish to avoid hotspots, but with time series they
> are unavoidable. I mean, sure you could place successive events at separate
> partitions, but then you can't do any kind of scanning/slicing.
>
> But, events for separate sensors are not true hotspots in the traditional
> sense - unless you have only a single sensor/unit.
>
> After considering your queries, the next step is to consider the
> cardinality of your data - how many sensors, how many units, rate of
> events, etc. That will feedback into queries as well, such as how big a
> slice or scan might be, as well as sizing of partitions.
>
> So, how many sensor units do you expect, how many sensors per unit, and
> expected rate of events per sensor?
>
> Try not to jump too quickly to specific solutions - there really is a
> method to understanding all of this other stuff upfront.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this 

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jason Kania
Jack,
Thanks for the response.
We are targeting our database design to 1 sensor units and each sensor unit 
has 32 sensors. We are seeing about 700 events per day per sensor, each 
providing about 2K of data. Based on keeping each partition to about 10 Mb 
(based on readings we saw on performance), we chose to break our partitions on 
a weekly basis. This is possibly finer than we need as we were seeing timeouts 
only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for 
our analysis and will use only the sensorUnitId and sensorId to uniquely 
identify the data source with the timeShard value used to break up our 
partitions. We have handling to sequentially scan based on our "timeShard" 
value, but don't have a good handle on the determination of the "timeShard" 
portion of the partition key at read time. The data starts coming in when a 
subscriber starts using our system and finishes when they discontinue service 
or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, 
it is with respect to storing the maximum and minimum timeShard values in 
another table for subsequent lookup or the cost of running the current 
implementation of SELECT DISTINCT. We need to run queries such as getting the 
first or last 5000 sensor readings when we don't know the time frame at which 
they occurred so cannot directly supply the timeShard portion of our partition 
key.

I appreciate your input,
Thanks,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
I'll stay away from advising on a specific schema per se, but I'll stick to the 
advice that you need to make sure that your queries are depending solely on the 
columns of the primary key or relatively short slices/scans, rather than run 
the risk of very long scans or having to process multiple partitions for a 
single query. That's canned to some extent, but still essential.
Of course we generally wish to avoid hotspots, but with time series they are 
unavoidable. I mean, sure you could place successive events at separate 
partitions, but then you can't do any kind of scanning/slicing.
But, events for separate sensors are not true hotspots in the traditional sense 
- unless you have only a single sensor/unit.
After considering your queries, the next step is to consider the cardinality of 
your data - how many sensors, how many units, rate of events, etc. That will 
feedback into queries as well, such as how big a slice or scan might be, as 
well as sizing of partitions.
So, how many sensor units do you expect, how many sensors per unit, and 
expected rate of events per sensor?
Try not to jump too quickly to specific solutions - there really is a method to 
understanding all of this other stuff upfront.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response. I don't think I provided enough information and used 
the wrong terminology as your response is more the canned advice is response to 
Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific 
problem is in knowing the maximum and minimum timeShard values when we want to 
select the entire range of data. Our understanding is that if we update another 
related table with the maximum and minimum timeShard value for a given 
sensorUnitId and sensorId combination, we will create a hotspot and lots of 
tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the 
table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Thursday, March 10, 2016 10:42 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
There is an effort underway to support wider 
rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may need a 
more optimal data model since large-scale scanning/filtering is always a very 
bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the 
data model and that each form of query requires a separate table (

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
I'll stay away from advising on a specific schema per se, but I'll stick to
the advice that you need to make sure that your queries are depending
solely on the columns of the primary key or relatively short slices/scans,
rather than run the risk of very long scans or having to process multiple
partitions for a single query. That's canned to some extent, but still
essential.

Of course we generally wish to avoid hotspots, but with time series they
are unavoidable. I mean, sure you could place successive events at separate
partitions, but then you can't do any kind of scanning/slicing.

But, events for separate sensors are not true hotspots in the traditional
sense - unless you have only a single sensor/unit.

After considering your queries, the next step is to consider the
cardinality of your data - how many sensors, how many units, rate of
events, etc. That will feedback into queries as well, such as how big a
slice or scan might be, as well as sizing of partitions.

So, how many sensor units do you expect, how many sensors per unit, and
expected rate of events per sensor?

Try not to jump too quickly to specific solutions - there really is a
method to understanding all of this other stuff upfront.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com> wrote:

> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> --
> *From:* Jack Krupansky <jack.krupan...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so h

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Oops sorry, you wrote below that the shard is what I was suggesting.  I
didn't fully understand the problem you had.  I'll think about it a little
bit and come up w/ something.

On Thu, Mar 10, 2016 at 9:47 AM Jonathan Haddad <j...@jonhaddad.com> wrote:

> My advice was to use the date that the reading was recorded as part of the
> Partition key instead of some arbitrary shard id.  Then you don't have to
> look anything up in a different table.
>
>
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> date_recorded date,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, date_recorded), time);
>
>
> On Thu, Mar 10, 2016 at 9:29 AM Jason Kania <jason.ka...@ymail.com> wrote:
>
>> Hi Jonathan,
>>
>> Thanks for the response. To make this clearer, this is what we are doing:
>>
>> create table sensorReadings (
>> sensorUnitId int,
>> sensorId int,
>> time timestamp,
>> timeShard int,
>> readings blob,
>> primary key((sensorUnitId, sensorId, timeShard), time);
>>
>> where timeShard is a combination of year and week of year
>>
>> This works exactly as you mentioned when we know what time range we are
>> querying.
>>
>> The problem is that for those cases where we want to run through all the
>> readings for all timestamps, we don't know the first and last timeShard
>> value to use to constrain the query or iterate over each shard. Our
>> understanding is that updating another table with the maximum or minimum
>> timeShard values on every write to the above table would mean pounding a
>> single row with updates and running SELECT DISTINCT pulls all partition
>> keys.
>>
>> Hopefully this is clearer.
>>
>> Again, any suggestions would be appreciated.
>>
>> Thanks,
>>
>> Jason
>>
>> --
>> *From:* Jonathan Haddad <j...@jonhaddad.com>
>> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
>> *Sent:* Thursday, March 10, 2016 11:21 AM
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Have you considered making the date (or week, or whatever, some time
>> component) part of your partition key?
>>
>> something like:
>>
>> create table sensordata (
>> sensor_id int,
>> day date,
>> ts datetime,
>> reading int,
>> primary key((sensor_id, day), ts);
>>
>> Then if you know you need data by a particular date range, just issue
>> multiple async queries for each day you need.
>>
>> On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <jason.ka...@ymail.com>
>> wrote:
>>
>> Hi,
>>
>> We have sensor input that creates very wide rows and operations on these
>> rows have started to timeout regulary. We have been trying to find a
>> solution to dividing wide rows but keep hitting limitations that move the
>> problem around instead of solving it.
>>
>> We have a partition key consisting of a sensorUnitId and a sensorId and
>> use a time field to access each column in the row. We tried adding a time
>> based entry, timeShardId, to the partition key that consists of the year
>> and week of year during which the reading was taken. This works for a
>> number of queries but for scanning all the readings against a particular
>> sensorUnitId and sensorId combination, we seem to be stuck.
>>
>> We won't know the range of valid values of the timeShardId for a given
>> sensorUnitId and sensorId combination so would have to write to an
>> additional table to track the valid timeShardId. We suspect this would
>> create tombstone accumulation problems given the number of updates required
>> to the same row so haven't tried this option.
>>
>> Alternatively, we hit a different bottleneck in the form of SELECT
>> DISTINCT in trying to directly access the partition keys. Since SELECT
>> DISTINCT does not allow for a where clause to filter on the partition key
>> values, we have to filter several hundred thousand partition keys just to
>> find those related to the relevant sensorUnitId and sensorId. This problem
>> will only grow worse for us.
>>
>> Are there any other approaches that can be suggested? We have been
>> looking around, but haven't found any references beyond the initial
>> suggestion to add some sort of shard id to the partition key to handle wide
>> rows.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>


Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
My advice was to use the date that the reading was recorded as part of the
Partition key instead of some arbitrary shard id.  Then you don't have to
look anything up in a different table.


create table sensorReadings (
sensorUnitId int,
sensorId int,
date_recorded date,
time timestamp,
timeShard int,
readings blob,
primary key((sensorUnitId, sensorId, date_recorded), time);


On Thu, Mar 10, 2016 at 9:29 AM Jason Kania <jason.ka...@ymail.com> wrote:

> Hi Jonathan,
>
> Thanks for the response. To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> This works exactly as you mentioned when we know what time range we are
> querying.
>
> The problem is that for those cases where we want to run through all the
> readings for all timestamps, we don't know the first and last timeShard
> value to use to constrain the query or iterate over each shard. Our
> understanding is that updating another table with the maximum or minimum
> timeShard values on every write to the above table would mean pounding a
> single row with updates and running SELECT DISTINCT pulls all partition
> keys.
>
> Hopefully this is clearer.
>
> Again, any suggestions would be appreciated.
>
> Thanks,
>
> Jason
>
> --
> *From:* Jonathan Haddad <j...@jonhaddad.com>
> *To:* user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com>
> *Sent:* Thursday, March 10, 2016 11:21 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Have you considered making the date (or week, or whatever, some time
> component) part of your partition key?
>
> something like:
>
> create table sensordata (
> sensor_id int,
> day date,
> ts datetime,
> reading int,
> primary key((sensor_id, day), ts);
>
> Then if you know you need data by a particular date range, just issue
> multiple async queries for each day you need.
>
> On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <jason.ka...@ymail.com> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>


Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Jack,
Thanks for the response. I don't think I provided enough information and used 
the wrong terminology as your response is more the canned advice is response to 
Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific 
problem is in knowing the maximum and minimum timeShard values when we want to 
select the entire range of data. Our understanding is that if we update another 
related table with the maximum and minimum timeShard value for a given 
sensorUnitId and sensorId combination, we will create a hotspot and lots of 
tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the 
table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

  From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Thursday, March 10, 2016 10:42 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
There is an effort underway to support wider 
rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may need a 
more optimal data model since large-scale scanning/filtering is always a very 
bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the 
data model and that each form of query requires a separate table ("query 
table.") Materialized view can automate that process for a lot of cases, but in 
any case it does sound as if some of your queries do require additional tables.
As a general proposition, Cassandra should not be used for heavy filtering - 
query tables with the filtering criteria baked into the PK is the way to go.

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote:

Hi,
We have sensor input that creates very wide rows and operations on these rows 
have started to timeout regulary. We have been trying to find a solution to 
dividing wide rows but keep hitting limitations that move the problem around 
instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a 
time field to access each column in the row. We tried adding a time based 
entry, timeShardId, to the partition key that consists of the year and week of 
year during which the reading was taken. This works for a number of queries but 
for scanning all the readings against a particular sensorUnitId and sensorId 
combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given 
sensorUnitId and sensorId combination so would have to write to an additional 
table to track the valid timeShardId. We suspect this would create tombstone 
accumulation problems given the number of updates required to the same row so 
haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in 
trying to directly access the partition keys. Since SELECT DISTINCT does not 
allow for a where clause to filter on the partition key values, we have to 
filter several hundred thousand partition keys just to find those related to 
the relevant sensorUnitId and sensorId. This problem will only grow worse for 
us.

Are there any other approaches that can be suggested? We have been looking 
around, but haven't found any references beyond the initial suggestion to add 
some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason




   

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Hi Jonathan,

Thanks for the response. To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
This works exactly as you mentioned when we know what time range we are 
querying.

The problem is that for those cases where we want to run through all the 
readings for all timestamps, we don't know the first and last timeShard value 
to use to constrain the query or iterate over each shard. Our understanding is 
that updating another table with the maximum or minimum timeShard values on 
every write to the above table would mean pounding a single row with updates 
and running SELECT DISTINCT pulls all partition keys.

Hopefully this is clearer.
Again, any suggestions would be appreciated.

Thanks,
Jason

  From: Jonathan Haddad <j...@jonhaddad.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Thursday, March 10, 2016 11:21 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
Have you considered making the date (or week, or whatever, some time component) 
part of your partition key?
something like:
create table sensordata (sensor_id int,day date,ts datetime,reading int,primary 
key((sensor_id, day), ts);
Then if you know you need data by a particular date range, just issue multiple 
async queries for each day you need.
On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <jason.ka...@ymail.com> wrote:

Hi,
We have sensor input that creates very wide rows and operations on these rows 
have started to timeout regulary. We have been trying to find a solution to 
dividing wide rows but keep hitting limitations that move the problem around 
instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a 
time field to access each column in the row. We tried adding a time based 
entry, timeShardId, to the partition key that consists of the year and week of 
year during which the reading was taken. This works for a number of queries but 
for scanning all the readings against a particular sensorUnitId and sensorId 
combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given 
sensorUnitId and sensorId combination so would have to write to an additional 
table to track the valid timeShardId. We suspect this would create tombstone 
accumulation problems given the number of updates required to the same row so 
haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in 
trying to directly access the partition keys. Since SELECT DISTINCT does not 
allow for a where clause to filter on the partition key values, we have to 
filter several hundred thousand partition keys just to find those related to 
the relevant sensorUnitId and sensorId. This problem will only grow worse for 
us.

Are there any other approaches that can be suggested? We have been looking 
around, but haven't found any references beyond the initial suggestion to add 
some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason



   

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Have you considered making the date (or week, or whatever, some time
component) part of your partition key?

something like:

create table sensordata (
sensor_id int,
day date,
ts datetime,
reading int,
primary key((sensor_id, day), ts);

Then if you know you need data by a particular date range, just issue
multiple async queries for each day you need.

On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <jason.ka...@ymail.com> wrote:

> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>


Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jack Krupansky
There is an effort underway to support wider rows:
https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may
need a more optimal data model since large-scale scanning/filtering is
always a very bad idea with Cassandra.

The data modeling methodology for Cassandra dictates that queries drive the
data model and that each form of query requires a separate table ("query
table.") Materialized view can automate that process for a lot of cases,
but in any case it does sound as if some of your queries do require
additional tables.

As a general proposition, Cassandra should not be used for heavy filtering
- query tables with the filtering criteria baked into the PK is the way to
go.


-- Jack Krupansky

On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote:

> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>


Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Hi,
We have sensor input that creates very wide rows and operations on these rows 
have started to timeout regulary. We have been trying to find a solution to 
dividing wide rows but keep hitting limitations that move the problem around 
instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a 
time field to access each column in the row. We tried adding a time based 
entry, timeShardId, to the partition key that consists of the year and week of 
year during which the reading was taken. This works for a number of queries but 
for scanning all the readings against a particular sensorUnitId and sensorId 
combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given 
sensorUnitId and sensorId combination so would have to write to an additional 
table to track the valid timeShardId. We suspect this would create tombstone 
accumulation problems given the number of updates required to the same row so 
haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in 
trying to directly access the partition keys. Since SELECT DISTINCT does not 
allow for a where clause to filter on the partition key values, we have to 
filter several hundred thousand partition keys just to find those related to 
the relevant sensorUnitId and sensorId. This problem will only grow worse for 
us.

Are there any other approaches that can be suggested? We have been looking 
around, but haven't found any references beyond the initial suggestion to add 
some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason


Re: Wide rows best practices and GC impact

2014-12-04 Thread Jabbar Azam
Hello,

I saw this earlier yesterday but didn't want to reply because I didn't know
what the cause was.

Basically I using wide rows with cassandra 1.x and was inserting data
constantly. After about 18 hours the JVM would crash with a dump file. For
some reason I removed the compaction throttling and the problem
disappeared. I've never really found out what the root cause was.


On Thu Dec 04 2014 at 2:49:57 AM Gianluca Borello gianl...@draios.com
wrote:

 Thanks Robert, I really appreciate your help!

 I'm still unsure why Cassandra 2.1 seem to perform much better in that
 same scenario (even setting the same values of compaction threshold and
 number of compactors), but I guess we'll revise when we'll decide to
 upgrade 2.1 in production.

 On Dec 3, 2014 6:33 PM, Robert Coli rc...@eventbrite.com wrote:
 
  On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello gianl...@draios.com
 wrote:
 
  We mainly store time series-like data, where each data point is a
 binary blob of 5-20KB. We use wide rows, and try to put in the same row all
 the data that we usually need in a single query (but not more than that).
 As a result, our application logic is very simple (since we have to do just
 one query to read the data on average) and read/write response times are
 very satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:
 
 
  100mb is not HYOOOGE but is around the size where large rows can cause
 heap pressure.
 
  You seem to be unclear on the implications of pending compactions,
 however.
 
  Briefly, pending compactions indicate that you have more SSTables than
 you should. As compaction both merges row versions and reduces the number
 of SSTables, a high number of pending compactions causes problems
 associated with both having too many row versions (fragmentation) and a
 large number of SSTables (per-SSTable heap/memory (depending on version)
 overhead like bloom filters and index samples). In your case, it seems the
 problem is probably just the compaction throttle being too low.
 
  My conjecture is that, given your normal data size and read/write
 workload, you are relatively close to GC pre-fail when compaction is
 working. When it stops working, you relatively quickly get into a state
 where you exhaust heap because you have too many SSTables.
 
  =Rob
  http://twitter.com/rcolidba
  PS - Given 30GB of RAM on the machine, you could consider investigating
 large-heap configurations, rbranson from Instagram has some slides out
 there on the topic. What you pay is longer stop the world GCs, IOW latency
 if you happen to be talking to a replica node when it pauses.
 



Re: Wide rows best practices and GC impact

2014-12-03 Thread Robert Coli
On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello gianl...@draios.com
wrote:

 We mainly store time series-like data, where each data point is a binary
 blob of 5-20KB. We use wide rows, and try to put in the same row all the
 data that we usually need in a single query (but not more than that). As a
 result, our application logic is very simple (since we have to do just one
 query to read the data on average) and read/write response times are very
 satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:


100mb is not HYOOOGE but is around the size where large rows can cause heap
pressure.

You seem to be unclear on the implications of pending compactions, however.

Briefly, pending compactions indicate that you have more SSTables than you
should. As compaction both merges row versions and reduces the number of
SSTables, a high number of pending compactions causes problems associated
with both having too many row versions (fragmentation) and a large number
of SSTables (per-SSTable heap/memory (depending on version) overhead like
bloom filters and index samples). In your case, it seems the problem is
probably just the compaction throttle being too low.

My conjecture is that, given your normal data size and read/write workload,
you are relatively close to GC pre-fail when compaction is working. When
it stops working, you relatively quickly get into a state where you exhaust
heap because you have too many SSTables.

=Rob
http://twitter.com/rcolidba
PS - Given 30GB of RAM on the machine, you could consider investigating
large-heap configurations, rbranson from Instagram has some slides out
there on the topic. What you pay is longer stop the world GCs, IOW latency
if you happen to be talking to a replica node when it pauses.


Re: Wide rows best practices and GC impact

2014-12-03 Thread Gianluca Borello
Thanks Robert, I really appreciate your help!

I'm still unsure why Cassandra 2.1 seem to perform much better in that same
scenario (even setting the same values of compaction threshold and number
of compactors), but I guess we'll revise when we'll decide to upgrade 2.1
in production.

On Dec 3, 2014 6:33 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello gianl...@draios.com
wrote:

 We mainly store time series-like data, where each data point is a binary
blob of 5-20KB. We use wide rows, and try to put in the same row all the
data that we usually need in a single query (but not more than that). As a
result, our application logic is very simple (since we have to do just one
query to read the data on average) and read/write response times are very
satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:


 100mb is not HYOOOGE but is around the size where large rows can cause
heap pressure.

 You seem to be unclear on the implications of pending compactions,
however.

 Briefly, pending compactions indicate that you have more SSTables than
you should. As compaction both merges row versions and reduces the number
of SSTables, a high number of pending compactions causes problems
associated with both having too many row versions (fragmentation) and a
large number of SSTables (per-SSTable heap/memory (depending on version)
overhead like bloom filters and index samples). In your case, it seems the
problem is probably just the compaction throttle being too low.

 My conjecture is that, given your normal data size and read/write
workload, you are relatively close to GC pre-fail when compaction is
working. When it stops working, you relatively quickly get into a state
where you exhaust heap because you have too many SSTables.

 =Rob
 http://twitter.com/rcolidba
 PS - Given 30GB of RAM on the machine, you could consider investigating
large-heap configurations, rbranson from Instagram has some slides out
there on the topic. What you pay is longer stop the world GCs, IOW latency
if you happen to be talking to a replica node when it pauses.



Wide rows best practices and GC impact

2014-12-02 Thread Gianluca Borello
Hi,

We have a cluster (2.0.11) of 6 nodes (RF=3), c3.4xlarge instances, about
50 column families. Cassandra heap takes 8GB out of the 30GB of every
instance.

We mainly store time series-like data, where each data point is a binary
blob of 5-20KB. We use wide rows, and try to put in the same row all the
data that we usually need in a single query (but not more than that). As a
result, our application logic is very simple (since we have to do just one
query to read the data on average) and read/write response times are very
satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:

SSTables per Read
1 sstables: 3198856
2 sstables: 45

Write Latency (microseconds)
  4 us: 37
  5 us: 1247
  6 us: 9987
  7 us: 31442
  8 us: 66121
 10 us: 400503
 12 us: 1158329
 14 us: 2873934
 17 us: 11843616
 20 us: 24464275
 24 us: 30574717
 29 us: 24351624
 35 us: 16788801
 42 us: 3935374
 50 us: 797781
 60 us: 272160
 72 us: 121819
 86 us: 64641
103 us: 41085
124 us: 33618
149 us: 199463
179 us: 255445
215 us: 38238
258 us: 12300
310 us: 5307
372 us: 3180
446 us: 2443
535 us: 1773
642 us: 1314
770 us: 991
924 us: 748
   1109 us: 606
   1331 us: 465
   1597 us: 433
   1916 us: 453
   2299 us: 484
   2759 us: 983
   3311 us: 976
   3973 us: 338
   4768 us: 312
   5722 us: 237
   6866 us: 198
   8239 us: 163
   9887 us: 138
  11864 us: 115
  14237 us: 231
  17084 us: 550
  20501 us: 603
  24601 us: 635
  29521 us: 875
  35425 us: 731
  42510 us: 497
  51012 us: 476
  61214 us: 347
  73457 us: 331
  88148 us: 273
 105778 us: 143
 126934 us: 92
 152321 us: 47
 182785 us: 16
 219342 us: 5
 263210 us: 2
 315852 us: 2
 379022 us: 1
 454826 us: 1
 545791 us: 1
 654949 us: 0
 785939 us: 0
 943127 us: 1
1131752 us: 1

Read Latency (microseconds)
 20 us: 1
 24 us: 9
 29 us: 18
 35 us: 96
 42 us: 6989
 50 us: 113305
 60 us: 552348
 72 us: 772329
 86 us: 654019
103 us: 578404
124 us: 300364
149 us: 111522
179 us: 37385
215 us: 18353
258 us: 10733
310 us: 7915
372 us: 9406
446 us: 7645
535 us: 2773
642 us: 1323
770 us: 1351
924 us: 953
   1109 us: 857
   1331 us: 1122
   1597 us: 800
   1916 us: 806
   2299 us: 686
   2759 us: 581
   3311 us: 671
   3973 us: 318
   4768 us: 318
   5722 us: 226
   6866 us: 164
   8239 us: 161
   9887 us: 134
  11864 us: 125
  14237 us: 184
  17084 us: 285
  20501 us: 315
  24601 us: 378
  29521 us: 431
  35425 us: 468
  42510 us: 469
  51012 us: 466
  61214 us: 407
  73457 us: 337
  88148 us: 297
 105778 us: 242
 126934 us: 135
 152321 us: 109
 182785 us: 57
 219342 us: 41
 263210 us: 28
 315852 us: 16
 379022 us: 12
 454826 us: 6
 545791 us: 6
 654949 us: 0
 785939 us: 0
 943127 us: 0
1131752 us: 2

Partition Size (bytes)
3311 bytes: 1
3973 bytes: 2
4768 bytes: 0
5722 bytes: 2
6866 bytes: 0
8239 bytes: 0
9887 bytes: 2
   11864 bytes: 1
   14237 bytes: 0
   17084 bytes: 0
   20501 bytes: 0
   24601 bytes: 0
   29521 bytes: 3
   35425 bytes: 0
   42510 bytes: 1
   51012 bytes: 1
   61214 bytes: 1
   73457 bytes: 3
   88148 bytes: 1
  105778 bytes: 5
  126934 bytes: 2
  152321 bytes: 4
  182785 bytes: 65
  219342 bytes: 165
  263210 bytes: 268
  315852 bytes: 201
  379022 bytes: 30
  454826 bytes: 248
  545791 bytes: 16
  654949 bytes: 41
  785939 bytes: 259
  943127 bytes: 547
 1131752 bytes: 243
 1358102 bytes: 176
 1629722 bytes: 59
 1955666 bytes: 37
 2346799 bytes: 41
 2816159 bytes: 78
 3379391 bytes: 243
 4055269 bytes: 122
 4866323 bytes: 209
 5839588 bytes: 220
 7007506 bytes: 266
 8409007 bytes: 77
10090808 bytes: 103
12108970 bytes: 1
14530764 bytes: 2
17436917 bytes: 7
20924300 bytes: 410
25109160 bytes: 76

Cell Count per Partition
3 cells: 5
4 cells: 0
5 cells: 0
6 cells: 2
7 cells: 0
8 cells: 0
   10 cells: 2
   12 cells: 1
   14 cells: 0
   17 cells: 0
   20 cells: 1
   24 cells: 3
   29 cells: 1
   35 cells: 1
   42 cells: 0
   50 cells: 0
   60 cells: 3
   72 cells: 0
   86 cells: 1
  103 cells: 0
  124 cells: 11
  149 cells: 3
  179 cells: 4
  215 cells: 10
  258 cells: 13
  310 cells: 2181
  372 cells: 2
  446 cells: 2
  535 cells: 2
  642 cells: 4
  770 cells: 7
  924 cells: 488
 1109 cells: 3
 1331 cells: 24
 1597 cells: 143
 1916 cells: 332
 2299 cells: 2
 2759 cells: 5
 3311 cells: 483
 3973 cells: 0
 4768 cells: 2
 5722 cells: 1
 6866 cells: 1
 8239 cells: 0
 9887 cells: 2
11864 cells: 244
14237 cells: 1
17084 cells: 248
20501 cells: 1
24601 cells: 1
29521 cells: 1
35425 cells: 2
42510 cells: 1
51012 cells: 2
61214 cells: 237


Read Count: 3202919
Read Latency: 0.16807454013042478 ms.
Write Count: 118568574
Write Latency: 0.026566498615391967 ms.
Pending Tasks: 0
  Table: protobuf_by_agent1
  SSTable count: 49
  SSTables in each level: [1, 11/10, 37, 0, 0, 0, 0, 0, 0]
  Space used (live), bytes: 6934395462

how wide can wide rows get?

2014-11-13 Thread Adaryl Bob Wakefield, MBA
I’m struggling with this wide row business. Is there an upward limit on the 
number of columns you can have?

Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData 

Re: how wide can wide rows get?

2014-11-13 Thread Hannu Kröger
The theoretical limit is maybe 2 billion but recommended max is around 10-20 
thousand. 

Br,
Hannu

 On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:
 
 I’m struggling with this wide row business. Is there an upward limit on the 
 number of columns you can have?
  
 Adaryl Bob Wakefield, MBA
 Principal
 Mass Street Analytics
 913.938.6685
 www.linkedin.com/in/bobwakefieldmba
 Twitter: @BobLovesData


Re: how wide can wide rows get?

2014-11-13 Thread Joe Ramsey
You can have up to 2 billion columns but there are some considerations.  

This article might be of some help.

http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.VGWdT4enCS0
 
http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.VGWdT4enCS0




 On Nov 14, 2014, at 1:10 AM, Adaryl Bob Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:
 
 I’m struggling with this wide row business. Is there an upward limit on the 
 number of columns you can have?
  
 Adaryl Bob Wakefield, MBA
 Principal
 Mass Street Analytics
 913.938.6685
 www.linkedin.com/in/bobwakefieldmba
 Twitter: @BobLovesData



Re[2]: how wide can wide rows get?

2014-11-13 Thread Plotnik, Alexey
We have 380k of them in some of our rows and it's ok.

-- Original Message --
From: Hannu Kröger hkro...@gmail.commailto:hkro...@gmail.com
To: user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Sent: 14.11.2014 16:13:49
Subject: Re: how wide can wide rows get?

The theoretical limit is maybe 2 billion but recommended max is around 10-20 
thousand.

Br,
Hannu

On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA 
adaryl.wakefi...@hotmail.commailto:adaryl.wakefi...@hotmail.com wrote:

I’m struggling with this wide row business. Is there an upward limit on the 
number of columns you can have?

Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmbahttp://www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData


Re: Re[2]: how wide can wide rows get?

2014-11-13 Thread Takenori Sato
We have up to a few hundreds of millions of columns in a super wide row.

There are two major issues you should care about.

1. the wider the row is, the more memory pressure you get for every slice
query
2. repair is row based, which means a huge row could be transferred at
every repair

1 is not a big issue if you don't have many concurrent slice requests.
Having more cores is a good investment to reduce memory pressure.

2  could cause very high memory pressure as well as poorer disk utilization.


On Fri, Nov 14, 2014 at 3:21 PM, Plotnik, Alexey aplot...@rhonda.ru wrote:

  We have 380k of them in some of our rows and it's ok.

 -- Original Message --
 From: Hannu Kröger hkro...@gmail.com
 To: user@cassandra.apache.org user@cassandra.apache.org
 Sent: 14.11.2014 16:13:49
 Subject: Re: how wide can wide rows get?


 The theoretical limit is maybe 2 billion but recommended max is around
 10-20 thousand.

 Br,
 Hannu

 On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:

   I’m struggling with this wide row business. Is there an upward limit on
 the number of columns you can have?

 Adaryl Bob Wakefield, MBA
 Principal
 Mass Street Analytics
 913.938.6685
 www.linkedin.com/in/bobwakefieldmba
 Twitter: @BobLovesData




Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
I am trying to use wide rows concept in my data modelling design for
Cassandra. We are using Cassandra 2.0.6.

CREATE TABLE test_data (
  test_id int,
  client_name text,
  record_data text,
  creation_date timestamp,
  last_modified_date timestamp,
  PRIMARY KEY (test_id, client_name, record_data)
)

So I came up with above table design. Does my above table falls under the
category of wide rows in Cassandra or not?

And is there any problem If I have three columns in my  PRIMARY KEY? I
guess PARTITION KEY will be test_id right? And what about other two?

In this table, we can have multiple record_data for same client_name.

Query Pattern will be -

select client_name, record_data from test_data where test_id = 1;


Re: Wide Rows - Data Model Design

2014-09-19 Thread Jonathan Lacefield
Hello,

  Yes, this is a wide row table design.  The first col is your Partition
Key.  The remaining 2 cols are clustering cols.  You will receive ordered
result sets based on client_name, record_date when running that query.

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

[image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax https://github.com/datastax/

On Fri, Sep 19, 2014 at 10:41 AM, Check Peck comptechge...@gmail.com
wrote:

 I am trying to use wide rows concept in my data modelling design for
 Cassandra. We are using Cassandra 2.0.6.

 CREATE TABLE test_data (
   test_id int,
   client_name text,
   record_data text,
   creation_date timestamp,
   last_modified_date timestamp,
   PRIMARY KEY (test_id, client_name, record_data)
 )

 So I came up with above table design. Does my above table falls under the
 category of wide rows in Cassandra or not?

 And is there any problem If I have three columns in my  PRIMARY KEY? I
 guess PARTITION KEY will be test_id right? And what about other two?

 In this table, we can have multiple record_data for same client_name.

 Query Pattern will be -

 select client_name, record_data from test_data where test_id = 1;



Re: Wide Rows - Data Model Design

2014-09-19 Thread DuyHai Doan
Does my above table falls under the category of wide rows in Cassandra or
not? -- It depends on the cardinality. For each distinct test_id, how
many combinations of client_name/record_data do you have ?

 By the way, why do you put the record_data as part of primary key ?

In your table partiton key = test_id, client_name = first clustering
column, record_data = second clustering column


On Fri, Sep 19, 2014 at 5:41 PM, Check Peck comptechge...@gmail.com wrote:

 I am trying to use wide rows concept in my data modelling design for
 Cassandra. We are using Cassandra 2.0.6.

 CREATE TABLE test_data (
   test_id int,
   client_name text,
   record_data text,
   creation_date timestamp,
   last_modified_date timestamp,
   PRIMARY KEY (test_id, client_name, record_data)
 )

 So I came up with above table design. Does my above table falls under the
 category of wide rows in Cassandra or not?

 And is there any problem If I have three columns in my  PRIMARY KEY? I
 guess PARTITION KEY will be test_id right? And what about other two?

 In this table, we can have multiple record_data for same client_name.

 Query Pattern will be -

 select client_name, record_data from test_data where test_id = 1;



Re: Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
@DuyHai - I have put that because of this condition -

In this table, we can have multiple record_data for same client_name.

It can be multiple combinations of client_name and record_data for each
distinct test_id.


On Fri, Sep 19, 2014 at 8:48 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Does my above table falls under the category of wide rows in Cassandra
 or not? -- It depends on the cardinality. For each distinct test_id, how
 many combinations of client_name/record_data do you have ?

  By the way, why do you put the record_data as part of primary key ?

 In your table partiton key = test_id, client_name = first clustering
 column, record_data = second clustering column


 On Fri, Sep 19, 2014 at 5:41 PM, Check Peck comptechge...@gmail.com
 wrote:

 I am trying to use wide rows concept in my data modelling design for
 Cassandra. We are using Cassandra 2.0.6.

 CREATE TABLE test_data (
   test_id int,
   client_name text,
   record_data text,
   creation_date timestamp,
   last_modified_date timestamp,
   PRIMARY KEY (test_id, client_name, record_data)
 )

 So I came up with above table design. Does my above table falls under the
 category of wide rows in Cassandra or not?

 And is there any problem If I have three columns in my  PRIMARY KEY? I
 guess PARTITION KEY will be test_id right? And what about other two?

 In this table, we can have multiple record_data for same client_name.

 Query Pattern will be -

 select client_name, record_data from test_data where test_id = 1;





Re: Wide Rows - Data Model Design

2014-09-19 Thread DuyHai Doan
Ahh yes, sorry, I read too fast, missed it.

On Fri, Sep 19, 2014 at 5:54 PM, Check Peck comptechge...@gmail.com wrote:

 @DuyHai - I have put that because of this condition -

 In this table, we can have multiple record_data for same client_name.

 It can be multiple combinations of client_name and record_data for each
 distinct test_id.


 On Fri, Sep 19, 2014 at 8:48 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Does my above table falls under the category of wide rows in Cassandra
 or not? -- It depends on the cardinality. For each distinct test_id, how
 many combinations of client_name/record_data do you have ?

  By the way, why do you put the record_data as part of primary key ?

 In your table partiton key = test_id, client_name = first clustering
 column, record_data = second clustering column


 On Fri, Sep 19, 2014 at 5:41 PM, Check Peck comptechge...@gmail.com
 wrote:

 I am trying to use wide rows concept in my data modelling design for
 Cassandra. We are using Cassandra 2.0.6.

 CREATE TABLE test_data (
   test_id int,
   client_name text,
   record_data text,
   creation_date timestamp,
   last_modified_date timestamp,
   PRIMARY KEY (test_id, client_name, record_data)
 )

 So I came up with above table design. Does my above table falls under
 the category of wide rows in Cassandra or not?

 And is there any problem If I have three columns in my  PRIMARY KEY? I
 guess PARTITION KEY will be test_id right? And what about other two?

 In this table, we can have multiple record_data for same client_name.

 Query Pattern will be -

 select client_name, record_data from test_data where test_id = 1;






Re: CQL 3 and wide rows

2014-05-20 Thread Aaron Morton
In a CQL 3 table the only **column** names are the ones defined in the table, 
in the example below there are three column names. 


 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​

Internally there may be more **cells** ( as we now call the internal columns). 
In the example above each value for row_key will create a single partition (as 
we now call internal storage engine rows). In each of those partitions there 
will be cells for each CQL 3 row that has the same row_key, those cells will 
use a Composite for the name. The first part of the composite will be the value 
of the wide_row_column and the second will be the literal name of the non 
primary key columns. 

IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
thrift models. 

 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
Now days you can do pretty much everything you can in cli. Provide an example 
and we may be able to help. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:

 Hi James,
 
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are stored 
 within single folder on one computer. And it looks to me (but I’am not sure) 
 that CQL 3 driver loads all column names into memory - which is confusing to 
 me. From one side we have wide row, but we load whole into ram…..
 
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it looks 
 to me that CQL 3 is limited when compared to CLI/Hector.
 
 
 Regards,
 Maciej
 
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:
 
 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  So 
 while the schema will have a relatively smaller number of named columns, the 
 effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here? 
 
 
 Regards,
 Maciej Miklas
 



Re: CQL 3 and wide rows

2014-05-20 Thread Jack Krupansky
To keep the terminology clear, your “row_key” is actually the “partition key”, 
and “wide_row_column” is actually a “clustering column”, and the combination of 
your row_key and wide_row_column is a “compound primary key”.

-- Jack Krupansky

From: Aaron Morton 
Sent: Tuesday, May 20, 2014 3:06 AM
To: Cassandra User 
Subject: Re: CQL 3 and wide rows

In a CQL 3 table the only **column** names are the ones defined in the table, 
in the example below there are three column names.  


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​

Internally there may be more **cells** ( as we now call the internal columns). 
In the example above each value for row_key will create a single partition (as 
we now call internal storage engine rows). In each of those partitions there 
will be cells for each CQL 3 row that has the same row_key, those cells will 
use a Composite for the name. The first part of the composite will be the value 
of the wide_row_column and the second will be the literal name of the non 
primary key columns. 

IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
thrift models. 

  But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
when compared to CLI/Hector.
Now days you can do pretty much everything you can in cli. Provide an example 
and we may be able to help. 

Cheers
Aaron

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:


  Hi James, 

  Clustering is based on rows. I think that you meant not clustering columns, 
but compound columns. Still all columns belong to single table and are stored 
within single folder on one computer. And it looks to me (but I’am not sure) 
that CQL 3 driver loads all column names into memory - which is confusing to 
me. From one side we have wide row, but we load whole into ram…..

  My understanding of wide row is a row that supports millions of columns, or 
similar things like map or set. In CLI you would generate column names (or use 
compound columns) to simulate set or map,  in CQL 3 you would use some static 
names plus Map or Set structures, or you could still alter table and have large 
number of columns. But still - I do not see Iteration, so it looks to me that 
CQL 3 is limited when compared to CLI/Hector.


  Regards,
  Maciej

  On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:


Maciej,


In CQL3 wide rows are expected to be created using clustering columns.  
So while the schema will have a relatively smaller number of named columns, the 
effect is a wide row.  For example:


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​


James




From: Maciej Miklas mac.mik...@gmail.com
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org
Subject: CQL 3 and wide rows 

Hi *, 

I’ve checked DataStax driver code for CQL 3, and it looks like the column 
names for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of 
columns. Knowing that, I would expect kind of iterator for column names. Am I 
missing something here? 


Regards,
Maciej Miklas



Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
yes :)

On 20 May 2014, at 14:24, Jack Krupansky j...@basetechnology.com wrote:

 To keep the terminology clear, your “row_key” is actually the “partition 
 key”, and “wide_row_column” is actually a “clustering column”, and the 
 combination of your row_key and wide_row_column is a “compound primary key”.
  
 -- Jack Krupansky
  
 From: Aaron Morton
 Sent: Tuesday, May 20, 2014 3:06 AM
 To: Cassandra User
 Subject: Re: CQL 3 and wide rows
  
 In a CQL 3 table the only **column** names are the ones defined in the table, 
 in the example below there are three column names. 
  
  
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
  
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
  
 Internally there may be more **cells** ( as we now call the internal 
 columns). In the example above each value for row_key will create a single 
 partition (as we now call internal storage engine rows). In each of those 
 partitions there will be cells for each CQL 3 row that has the same row_key, 
 those cells will use a Composite for the name. The first part of the 
 composite will be the value of the wide_row_column and the second will be the 
 literal name of the non primary key columns.
  
 IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
 thrift models.
  
 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
 Now days you can do pretty much everything you can in cli. Provide an example 
 and we may be able to help.
  
 Cheers
 Aaron
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
  
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
  
 On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:
 
 Hi James,
  
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are 
 stored within single folder on one computer. And it looks to me (but I’am 
 not sure) that CQL 3 driver loads all column names into memory - which is 
 confusing to me. From one side we have wide row, but we load whole into 
 ram…..
  
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it 
 looks to me that CQL 3 is limited when compared to CLI/Hector.
  
  
 Regards,
 Maciej
  
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com 
 wrote:
 
 Maciej,
  
 In CQL3 wide rows are expected to be created using clustering columns.  
 So while the schema will have a relatively smaller number of named columns, 
 the effect is a wide row.  For example:
  
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
  
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
  
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
  
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
  
 Cassandra should support wide rows, meaning tables with millions of 
 columns. Knowing that, I would expect kind of iterator for column names. Am 
 I missing something here?
  
  
 Regards,
 Maciej Miklas
 
  
 
  



Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Hi Aron,

Thanks for the answer!


Lest consider such CLI code:

for(int i = 0 ; i  10_000_000 ; i++) {
  set[‘rowKey1’][‘myCol::i’] = UUID.randomUUID();
}


The code above will create single row, that contains 10^6 columns sorted by 
‘i’. This will work fine, and this is the wide row to my understanding - row 
that holds many columns AND I can read only some part of it by right slice 
query. On the other hand side, I can iterate over all columns without latencies 
because data is stored on single node. I’ve been using similar structures as 
replacement for secondary indexes - it’s well known pattern.

How would I model it in CQL 3?

1) I could create Map, but Maps are fully loaded into memory, and Map 
containing 10^6 elements is definitely a problem. Plus it’s a big waste of RAM 
if you consider that I need only to read small subset.

2) I could alter table for each new column, which would create similar 
structure to this one from my CLI example. But it looks to me that all columns 
names are loaded into ram, which is still large limitation. I hope that I am 
wrong here - I am not sure.

3) I could redesign my model and divide data into many rows, but why would I do 
that, if I can use wide rows.

My idea of wide row, is a row that can hold large amount of key-value pairs (in 
any form), where I can filter on those keys to efficiently load only that part 
which I currently need.


Regards,
Maciej 


On 20 May 2014, at 09:06, Aaron Morton aa...@thelastpickle.com wrote:

 In a CQL 3 table the only **column** names are the ones defined in the table, 
 in the example below there are three column names. 
 
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 Internally there may be more **cells** ( as we now call the internal 
 columns). In the example above each value for row_key will create a single 
 partition (as we now call internal storage engine rows). In each of those 
 partitions there will be cells for each CQL 3 row that has the same row_key, 
 those cells will use a Composite for the name. The first part of the 
 composite will be the value of the wide_row_column and the second will be the 
 literal name of the non primary key columns. 
 
 IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
 thrift models. 
 
 But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
 when compared to CLI/Hector.
 Now days you can do pretty much everything you can in cli. Provide an example 
 and we may be able to help. 
 
 Cheers
 Aaron
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 20/05/2014, at 8:18 am, Maciej Miklas mac.mik...@gmail.com wrote:
 
 Hi James,
 
 Clustering is based on rows. I think that you meant not clustering columns, 
 but compound columns. Still all columns belong to single table and are 
 stored within single folder on one computer. And it looks to me (but I’am 
 not sure) that CQL 3 driver loads all column names into memory - which is 
 confusing to me. From one side we have wide row, but we load whole into 
 ram…..
 
 My understanding of wide row is a row that supports millions of columns, or 
 similar things like map or set. In CLI you would generate column names (or 
 use compound columns) to simulate set or map,  in CQL 3 you would use some 
 static names plus Map or Set structures, or you could still alter table and 
 have large number of columns. But still - I do not see Iteration, so it 
 looks to me that CQL 3 is limited when compared to CLI/Hector.
 
 
 Regards,
 Maciej
 
 On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com 
 wrote:
 
 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  
 So while the schema will have a relatively smaller number of named columns, 
 the effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of 
 columns. Knowing that, I would expect kind of iterator for column names. Am 
 I missing something here? 
 
 
 Regards,
 Maciej Miklas
 
 



Re: CQL 3 and wide rows

2014-05-20 Thread Nate McCall
Something like this might work:


cqlsh:my_keyspace CREATE TABLE my_widerow (
 ...   id text,
 ...   my_col timeuuid,
 ...   PRIMARY KEY (id, my_col)
 ... ) WITH caching='KEYS_ONLY' AND
 ...   compaction={'class': 'LeveledCompactionStrategy'};
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace insert into my_widerow (id, my_col) values
('some_key_1',now());
cqlsh:my_keyspace select * from my_widerow;

 id | my_col
+--
 some_key_1 | 7266d240-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 73ba0630-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10

cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and
my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10;

 id | my_col
+--
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10

cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and
my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10 and my_col 
76227ab0-e030-11e3-a50d-8b2f9bfbfa10;

 id | my_col
+--
 some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
 some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10



These queries would all work fine from the DS Java Driver. Note that only
the cells that are needed are pulled into memory:


./bin/nodetool cfstats my_keyspace my_widerow
   ...
   Column Family: my_widerow
   ...
   Average live cells per slice (last five minutes): 6.0
   ...


This shows that we are slicing across 6 rows on average for the last couple
of select statements.

Hope that helps.



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Thank you Nate - now I understand it ! This is real improvement when compared 
to CLI :)

Regards,
Maciej


On 20 May 2014, at 17:16, Nate McCall n...@thelastpickle.com wrote:

 Something like this might work:
 
 
 cqlsh:my_keyspace CREATE TABLE my_widerow (
  ...   id text,
  ...   my_col timeuuid,
  ...   PRIMARY KEY (id, my_col)
  ... ) WITH caching='KEYS_ONLY' AND
  ...   compaction={'class': 'LeveledCompactionStrategy'};
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace insert into my_widerow (id, my_col) values 
 ('some_key_1',now());
 cqlsh:my_keyspace select * from my_widerow;
 
  id | my_col
 +--
  some_key_1 | 7266d240-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 73ba0630-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10
 
 cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and 
 my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10;
 
  id | my_col
 +--
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76227ab0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 76cfd1b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 777364b0-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 7aa061b0-e030-11e3-a50d-8b2f9bfbfa10
 
 cqlsh:my_keyspace select * from my_widerow where id = 'some_key_1' and 
 my_col  73ba0630-e030-11e3-a50d-8b2f9bfbfa10 and my_col  
 76227ab0-e030-11e3-a50d-8b2f9bfbfa10;
 
  id | my_col
 +--
  some_key_1 | 74404d30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 74defe30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75569f30-e030-11e3-a50d-8b2f9bfbfa10
  some_key_1 | 75bf9a30-e030-11e3-a50d-8b2f9bfbfa10
 
 
 
 These queries would all work fine from the DS Java Driver. Note that only the 
 cells that are needed are pulled into memory:
 
 
 ./bin/nodetool cfstats my_keyspace my_widerow
...
Column Family: my_widerow
...
Average live cells per slice (last five minutes): 6.0
...
 
 
 This shows that we are slicing across 6 rows on average for the last couple 
 of select statements. 
 
 Hope that helps.
 
 
 
 -- 
 -
 Nate McCall
 Austin, TX
 @zznate
 
 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com



CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi *,

I’ve checked DataStax driver code for CQL 3, and it looks like the column
names for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of
columns. Knowing that, I would expect kind of iterator for column names. Am
I missing something here?


Regards,
Maciej Miklas


RE: CQL 3 and wide rows

2014-05-19 Thread James Campbell
Maciej,


In CQL3 wide rows are expected to be created using clustering columns.  So 
while the schema will have a relatively smaller number of named columns, the 
effect is a wide row.  For example:


CREATE TABLE keyspace.widerow (

row_key text,

wide_row_column text,

data_column text,

PRIMARY KEY (row_key, wide_row_column));


Check out, for example, 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.?


James


From: Maciej Miklas mac.mik...@gmail.com
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org
Subject: CQL 3 and wide rows

Hi *,

I've checked DataStax driver code for CQL 3, and it looks like the column names 
for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of columns. 
Knowing that, I would expect kind of iterator for column names. Am I missing 
something here?


Regards,
Maciej Miklas


Re: CQL 3 and wide rows

2014-05-19 Thread Jack Krupansky
You might want to review this blog post on supporting dynamic columns in CQL3, 
which points out that “the way to model dynamic cells in CQL is with a compound 
primary key.”

See:
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

-- Jack Krupansky

From: Maciej Miklas 
Sent: Monday, May 19, 2014 11:20 AM
To: user@cassandra.apache.org 
Subject: CQL 3 and wide rows

Hi *, 

I’ve checked DataStax driver code for CQL 3, and it looks like the column names 
for particular table are fully loaded into memory, it this true?

Cassandra should support wide rows, meaning tables with millions of columns. 
Knowing that, I would expect kind of iterator for column names. Am I missing 
something here? 


Regards,
Maciej Miklas

Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hallo Jack,

You have given a perfect example for wide row.  Each reading from sensor 
creates new column within a row. It was also possible with Hector/CLI to have 
millions of columns within a single row. According to this page 
http://wiki.apache.org/cassandra/CassandraLimitations single row can have 2 
billions columns.

How does this relate to CQL 3 and tables? 

I still do not understand it because:
- it looks like driver loads all column names into memory - it looks to me that 
the 2 billions limitation from CLI is not valid anymore
- Map and Set values do not support iterator 


Regards,
Maciej


On 19 May 2014, at 17:31, Jack Krupansky j...@basetechnology.com wrote:

 You might want to review this blog post on supporting dynamic columns in 
 CQL3, which points out that “the way to model dynamic cells in CQL is with a 
 compound primary key.”
  
 See:
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
  
 -- Jack Krupansky
  
 From: Maciej Miklas
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
  
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
  
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here?
  
  
 Regards,
 Maciej Miklas



Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi James,

Clustering is based on rows. I think that you meant not clustering columns, but 
compound columns. Still all columns belong to single table and are stored 
within single folder on one computer. And it looks to me (but I’am not sure) 
that CQL 3 driver loads all column names into memory - which is confusing to 
me. From one side we have wide row, but we load whole into ram…..

My understanding of wide row is a row that supports millions of columns, or 
similar things like map or set. In CLI you would generate column names (or use 
compound columns) to simulate set or map,  in CQL 3 you would use some static 
names plus Map or Set structures, or you could still alter table and have large 
number of columns. But still - I do not see Iteration, so it looks to me that 
CQL 3 is limited when compared to CLI/Hector.


Regards,
Maciej

On 19 May 2014, at 17:30, James Campbell ja...@breachintelligence.com wrote:

 Maciej,
 
 In CQL3 wide rows are expected to be created using clustering columns.  So 
 while the schema will have a relatively smaller number of named columns, the 
 effect is a wide row.  For example:
 
 CREATE TABLE keyspace.widerow (
 row_key text,
 wide_row_column text,
 data_column text,
 PRIMARY KEY (row_key, wide_row_column));
 
 Check out, for example, 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.​
 
 James
 From: Maciej Miklas mac.mik...@gmail.com
 Sent: Monday, May 19, 2014 11:20 AM
 To: user@cassandra.apache.org
 Subject: CQL 3 and wide rows
  
 Hi *,
 
 I’ve checked DataStax driver code for CQL 3, and it looks like the column 
 names for particular table are fully loaded into memory, it this true?
 
 Cassandra should support wide rows, meaning tables with millions of columns. 
 Knowing that, I would expect kind of iterator for column names. Am I missing 
 something here? 
 
 
 Regards,
 Maciej Miklas



CqlPagingInputFormat: paging through wide rows

2014-04-16 Thread Paolo Estrella
Hello,

I've just upgraded to Cassandra 1.2.16. I've also started using the
CqlPagingInputFormat within my map/reduce tasks.

I have a question with regard to using CqlPagingInputFormat for paging
through wide rows. I don't see a way to input more than one column at a
time into my Mapper.

I suppose a good way to explain is by comparing the
CqlPagingInputFormatwith the
ColumnFamilyInputFormat which I previously used.

My mapper when using CFIF looks like this (just the relevant bits):

@Override
protected void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
Context context) throws IOException, InterruptedException {
for (IColumn column : columns.values()) {
String value = ByteBufferUtil.string(column.value());
/* do interesting stuff with each column value */
}
}

My mapper when using CPIF looks like this (again, just the relevant bits):

@Override
protected void map(MapString, ByteBuffer key, MapString, ByteBuffer
columns, Context context) throws IOException, InterruptedException {
UUID name = UUIDSerializer.get().fromByteBuffer(columns.get(column1));
String value = ByteBufferUtil.string(columns.get(value));
/* do something interesting with the value */
}

In the case of CqlPagingInputFormat, the mapper receives each column (in
the wide row) one by one. Is there a way to receive a larger batch of
columns similar to using ColumnFamilyInputFormat with a column slice
predicate? Perhaps I need to specify a WHERE clause when using CPIF?

Does it even matter that my mappers are receiving only one column at a
time? I did notice that my map tasks take a significantly longer time
completing when using CqlPagingInputFormat (4x mappers receiving about 3
million input records each) than when using ColumnFamilyInputFormat with a
large column slice predicate.


Thanks in advance.

Regards,
Paolo


how wide to make wide rows in practice?

2013-12-18 Thread Lee Mighdoll
I think the recommendation once upon a time was to keep wide storage engine
internal rows from growing too large.  e.g. for time series, it was
recommended to partition samples by day or by hour to keep the size
manageable.

What's the current cassandra 2.0 advice on sizing for wide storage engine
rows?  Can we drop the added complexity of managing day/hour partitioning
for time series stores?

And what do you watch out for if the storage engine rows are a bit
uncomfortably large?  Do extra large rows slow the read path at all?  Or
something subtle like added latency from GC pressure at compaction time?

Cheers,
Lee


Re: how wide to make wide rows in practice?

2013-12-18 Thread Robert Coli
On Wed, Dec 18, 2013 at 9:26 AM, Lee Mighdoll l...@underneath.ca wrote:

 What's the current cassandra 2.0 advice on sizing for wide storage engine
 rows?  Can we drop the added complexity of managing day/hour partitioning
 for time series stores?


A few hundred megs at very most is generally
recommended. in_memory_compaction_limit_in_mb still defaults to 64mb, so
rows greater than this size are compacted on disk...

Cassandra 2.0 and CQL3 storage don't meaningfully change underlying storage
assumptions. It just packs an abstraction layer on top. Cassandra 2.1 moves
some of that abstraction down into storage, but most fundamental
assumptions will still remain the same.

https://issues.apache.org/jira/browse/CASSANDRA-5417

=Rob


Re: how wide to make wide rows in practice?

2013-12-18 Thread Lee Mighdoll
Hi Rob, thanks for the refresher, and the the issue link (fixed today too-
thanks Sylvain!).

Cheers,
Lee


On Wed, Dec 18, 2013 at 10:47 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 18, 2013 at 9:26 AM, Lee Mighdoll l...@underneath.ca wrote:

 What's the current cassandra 2.0 advice on sizing for wide storage engine
 rows?  Can we drop the added complexity of managing day/hour partitioning
 for time series stores?


 A few hundred megs at very most is generally
 recommended. in_memory_compaction_limit_in_mb still defaults to 64mb, so
 rows greater than this size are compacted on disk...

 Cassandra 2.0 and CQL3 storage don't meaningfully change underlying
 storage assumptions. It just packs an abstraction layer on top. Cassandra
 2.1 moves some of that abstraction down into storage, but most fundamental
 assumptions will still remain the same.

 https://issues.apache.org/jira/browse/CASSANDRA-5417

 =Rob



Re: Wide rows (time series data) and ORM

2013-10-23 Thread Vivek Mishra
Can Kundera work with wide rows in an ORM manner?

What specifically you looking for? Composite column based implementation
can be built using Kundera.
With Recent CQL3 developments, Kundera supports most of these. I think POJO
needs to be aware of number of fields needs to be persisted(Same as CQL3)

-Vivek


On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.com wrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les




Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
PlayOrm supports different types of wide rows like embedded list in the object, 
etc. etc.  There is a list of nosql patterns mixed with playorm patterns on 
this page

http://buffalosw.com/wiki/patterns-page/

From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, October 22, 2013 1:18 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Wide rows (time series data) and ORM

As I'm becoming more familiar with Cassandra I'm still trying to shift my 
thinking from relational to NoSQL.

Can Kundera work with wide rows in an ORM manner? In other words, can you 
actually design a POJO that fits the standard recipe for JPA usage? Would the 
queries return collections of the POJO to handle wide row data?

I had considered using Spring and JPA for Cassandra, but it appears that other 
than basic configuration issues for Cassandra, to use Spring and JPA on a 
Cassandra database seems like an effort in futility if Cassandra is used as a 
NoSQL database instead of mimicking an RDBMS solution.

If anyone can shed any light on this, I'd appreciate it.

Thanks.

Les



Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Hi Vivek,

What I'm looking for are a couple of things as I'm gaining an understanding
of Cassandra. With wide rows and time series data, how do you (or can you)
handle this data in an ORM manner? Now I understand that with CQL3, doing a
select * from time_series_data will return the data as multiple rows. So
does handling this data equal the way you would deal with any mapping of
objects to results in a relational manner? Would you still use a JPA
approach or is there a Cassandra/CQL3-specific way of interacting with the
database?

I expect to use a compound key for partitioning/clustering. For example I'm
planning on creating a table as follows:
  CREATE TABLE sensor_data (
sensor_id   text,
date   text,
data_time_stamptimestamp,
reading  int,
PRIMARY KEY ( (sensor_id, date),
data_time_stamp) );
The 'date' field will be day-specific so that for each day there will be a
new row created.

So will I be able to define a POJO, SensorData, with the fields show above
and basically process each 'row' returned by CQL as another SensorData
object?

Thanks.

Les



On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les





Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Thanks Dean. I'll check that page out.

Les


On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 PlayOrm supports different types of wide rows like embedded list in the
 object, etc. etc.  There is a list of nosql patterns mixed with playorm
 patterns on this page

 http://buffalosw.com/wiki/patterns-page/

 From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, October 22, 2013 1:18 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Wide rows (time series data) and ORM

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les




Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
Another idea is the open source Energy Databus project which does time series 
data and is based on PlayORM actually(ORM is a bad name since it is more noSQL 
patterns and not really relational).

http://www.nrel.gov/analysis/databus/

That Energy Databus project is mainly time series data with some meta data.  I 
think NREL may be holding an Energy Databus summit soon (though again it is 
100% time series data and they need to rename it to just Databus which has been 
talked about at NREL).

Dean

From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, October 23, 2013 11:12 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Wide rows (time series data) and ORM

Thanks Dean. I'll check that page out.

Les


On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
PlayOrm supports different types of wide rows like embedded list in the object, 
etc. etc.  There is a list of nosql patterns mixed with playorm patterns on 
this page

http://buffalosw.com/wiki/patterns-page/

From: Les Hartzman 
lhartz...@gmail.commailto:lhartz...@gmail.commailto:lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, October 22, 2013 1:18 PM
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Wide rows (time series data) and ORM

As I'm becoming more familiar with Cassandra I'm still trying to shift my 
thinking from relational to NoSQL.

Can Kundera work with wide rows in an ORM manner? In other words, can you 
actually design a POJO that fits the standard recipe for JPA usage? Would the 
queries return collections of the POJO to handle wide row data?

I had considered using Spring and JPA for Cassandra, but it appears that other 
than basic configuration issues for Cassandra, to use Spring and JPA on a 
Cassandra database seems like an effort in futility if Cassandra is used as a 
NoSQL database instead of mimicking an RDBMS solution.

If anyone can shed any light on this, I'd appreciate it.

Thanks.

Les




Re: Wide rows (time series data) and ORM

2013-10-23 Thread Vivek Mishra
Hi,
CREATE TABLE sensor_data (
sensor_id   text,
date   text,
data_time_stamptimestamp,
reading  int,
PRIMARY KEY ( (sensor_id, date),
data_time_stamp) );

Yes, you can create a POJO for this and map exactly with one row as a POJO
object.

Please have a look at:
https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera

There are users built production system using Kundera, please refer :
https://github.com/impetus-opensource/Kundera/wiki/Kundera-in-Production-Deployments


I am working as a core commitor in Kundera, please do let me know if you
have any query.

Sincerely,
-Vivek



On Wed, Oct 23, 2013 at 10:41 PM, Les Hartzman lhartz...@gmail.com wrote:

 Hi Vivek,

 What I'm looking for are a couple of things as I'm gaining an
 understanding of Cassandra. With wide rows and time series data, how do you
 (or can you) handle this data in an ORM manner? Now I understand that with
 CQL3, doing a select * from time_series_data will return the data as
 multiple rows. So does handling this data equal the way you would deal with
 any mapping of objects to results in a relational manner? Would you still
 use a JPA approach or is there a Cassandra/CQL3-specific way of interacting
 with the database?

 I expect to use a compound key for partitioning/clustering. For example
 I'm planning on creating a table as follows:
   CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );
 The 'date' field will be day-specific so that for each day there will be a
 new row created.

 So will I be able to define a POJO, SensorData, with the fields show above
 and basically process each 'row' returned by CQL as another SensorData
 object?

 Thanks.

 Les



 On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift
 my thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can
 you actually design a POJO that fits the standard recipe for JPA usage?
 Would the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les






Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Thanks Vivek. I'll look over those links tonight.



On Wed, Oct 23, 2013 at 4:20 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );

 Yes, you can create a POJO for this and map exactly with one row as a POJO
 object.

 Please have a look at:

 https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera

 There are users built production system using Kundera, please refer :

 https://github.com/impetus-opensource/Kundera/wiki/Kundera-in-Production-Deployments


 I am working as a core commitor in Kundera, please do let me know if you
 have any query.

 Sincerely,
 -Vivek



 On Wed, Oct 23, 2013 at 10:41 PM, Les Hartzman lhartz...@gmail.comwrote:

 Hi Vivek,

 What I'm looking for are a couple of things as I'm gaining an
 understanding of Cassandra. With wide rows and time series data, how do you
 (or can you) handle this data in an ORM manner? Now I understand that with
 CQL3, doing a select * from time_series_data will return the data as
 multiple rows. So does handling this data equal the way you would deal with
 any mapping of objects to results in a relational manner? Would you still
 use a JPA approach or is there a Cassandra/CQL3-specific way of interacting
 with the database?

 I expect to use a compound key for partitioning/clustering. For example
 I'm planning on creating a table as follows:
   CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );
 The 'date' field will be day-specific so that for each day there will be
 a new row created.

 So will I be able to define a POJO, SensorData, with the fields show
 above and basically process each 'row' returned by CQL as another
 SensorData object?

 Thanks.

 Les



 On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift
 my thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can
 you actually design a POJO that fits the standard recipe for JPA usage?
 Would the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears
 that other than basic configuration issues for Cassandra, to use Spring and
 JPA on a Cassandra database seems like an effort in futility if Cassandra
 is used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les







Wide rows (time series data) and ORM

2013-10-22 Thread Les Hartzman
As I'm becoming more familiar with Cassandra I'm still trying to shift my
thinking from relational to NoSQL.

Can Kundera work with wide rows in an ORM manner? In other words, can you
actually design a POJO that fits the standard recipe for JPA usage? Would
the queries return collections of the POJO to handle wide row data?

I had considered using Spring and JPA for Cassandra, but it appears that
other than basic configuration issues for Cassandra, to use Spring and JPA
on a Cassandra database seems like an effort in futility if Cassandra is
used as a NoSQL database instead of mimicking an RDBMS solution.

If anyone can shed any light on this, I'd appreciate it.

Thanks.

Les


Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman
So looking at Patrick McFadin's data modeling videos I now know about using
compound keys as a way of partitioning data on a by-day basis.

My other questions probably go more to the storage engine itself. How do
you refer to the columns in the wide row? What kind of names are assigned
to the columns?

Les
On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this row?
 The first time an insert takes place there will be one entry for all of the
 fields. Is there anything besides the sensor_id that is used to determine
 that the subsequent insertions for that sensor will go into the same row as
 opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les




Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Jon Haddad
If you're working with CQL, you don't need to worry about the column names, 
it's handled for you.

If you specify multiple keys as part of the primary key, they become clustering 
keys and are mapped to the column names.  So if you have a sensor_id / 
time_stamp, all your sensor readings will be in the same row in the traditional 
cassandra sense, sorted by your time_stamp.

On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about using 
 compound keys as a way of partitioning data on a by-day basis.
 
 My other questions probably go more to the storage engine itself. How do you 
 refer to the columns in the wide row? What kind of names are assigned to the 
 columns?
 
 Les
 
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:
 Please correct me if I'm not describing this correctly. But if I am 
 collecting sensor data and have a table defined as follows:
 
  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));
 
 The partitioning value is the sensor_id and the rest of the PK components 
 become part of the column name for the additional fields, in this case 
 voltage and amp.
 
 What goes into determining what additional data is inserted into this row? 
 The first time an insert takes place there will be one entry for all of the 
 fields. Is there anything besides the sensor_id that is used to determine 
 that the subsequent insertions for that sensor will go into the same row as 
 opposed to starting a new row?
 
 Base on something I read (but can't currently find again), I thought that as 
 long as all of the elements of the PK remain the same (same sensor_id and 
 still within the same hour as the first reading), that the next insertion 
 would be tacked onto the end of the first row. Is this correct?
 
 For subsequent entries into the same row for additional voltage/amp readings, 
 what are the names of the columns for these readings? My understanding is 
 that the column name becomes a concatenation of the non-row key field names 
 plus the data field names.So if the first go-around you have 
 time_stamp:voltage and time_stamp:amp, what do the subsequent column 
 names become? 
 
 Thanks.
 
 Les
 



Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman
What if you plan on using Kundera and JPQL and not CQL?

Les
On Oct 21, 2013 4:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're working with CQL, you don't need to worry about the column
 names, it's handled for you.

 If you specify multiple keys as part of the primary key, they become
 clustering keys and are mapped to the column names.  So if you have a
 sensor_id / time_stamp, all your sensor readings will be in the same row in
 the traditional cassandra sense, sorted by your time_stamp.

 On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about
 using compound keys as a way of partitioning data on a by-day basis.

 My other questions probably go more to the storage engine itself. How do
 you refer to the columns in the wide row? What kind of names are assigned
 to the columns?

 Les
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this
 row? The first time an insert takes place there will be one entry for all
 of the fields. Is there anything besides the sensor_id that is used to
 determine that the subsequent insertions for that sensor will go into the
 same row as opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les





Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman
So I just saw a post about how Kundera translates all JPQL to CQL.


On Mon, Oct 21, 2013 at 4:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're working with CQL, you don't need to worry about the column
 names, it's handled for you.

 If you specify multiple keys as part of the primary key, they become
 clustering keys and are mapped to the column names.  So if you have a
 sensor_id / time_stamp, all your sensor readings will be in the same row in
 the traditional cassandra sense, sorted by your time_stamp.

 On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about
 using compound keys as a way of partitioning data on a by-day basis.

 My other questions probably go more to the storage engine itself. How do
 you refer to the columns in the wide row? What kind of names are assigned
 to the columns?

 Les
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this
 row? The first time an insert takes place there will be one entry for all
 of the fields. Is there anything besides the sensor_id that is used to
 determine that the subsequent insertions for that sensor will go into the
 same row as opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les





Wide rows/composite keys clarification needed

2013-10-20 Thread Les Hartzman
Please correct me if I'm not describing this correctly. But if I am
collecting sensor data and have a table defined as follows:

 create table sensor_data (
   sensor_id int,
   time_stamp int,  // time to the hour granularity
   voltage float,
   amp float,
   PRIMARY KEY (sensor_id, time_stamp) ));

The partitioning value is the sensor_id and the rest of the PK components
become part of the column name for the additional fields, in this case
voltage and amp.

What goes into determining what additional data is inserted into this row?
The first time an insert takes place there will be one entry for all of the
fields. Is there anything besides the sensor_id that is used to determine
that the subsequent insertions for that sensor will go into the same row as
opposed to starting a new row?

Base on something I read (but can't currently find again), I thought that
as long as all of the elements of the PK remain the same (same sensor_id
and still within the same hour as the first reading), that the next
insertion would be tacked onto the end of the first row. Is this correct?

For subsequent entries into the same row for additional voltage/amp
readings, what are the names of the columns for these readings? My
understanding is that the column name becomes a concatenation of the
non-row key field names plus the data field names.So if the first go-around
you have time_stamp:voltage and time_stamp:amp, what do the
subsequent column names become?

Thanks.

Les


Re: token(), limit and wide rows

2013-08-17 Thread Richard Low
You can do it by using two types of query.  One using token as you suggest,
the other by fixing the partition key and walking through the other parts
of the composite primary key.

For example, consider the table:

create table paging (a text, b text, c text primary key (a, b));

I inserted ('1', '1', 'x'), ('1', '2', 'x'), ..., ('1', '5', 'x') and then
again for a='2.  Suppose the paging size is 3, then start with

 select * from paging limit 3;

 a | b | c
---+---+---
 2 | 1 | x
 2 | 2 | x
 2 | 3 | x

Now you don't know if there are more items with a='2', so run:

 select * from paging where a = '2' and b  '3' limit 3;

 a | b | c
---+---+---
 2 | 4 | x
 2 | 5 | x

You know there aren't any more because only two results were obtained, but
you can continue with greater values of b if required.

Now move on to the next a value (in token order):

 select * from paging where token(a)  token('2') limit 3;

 a | b | c
---+---+---
 1 | 1 | x
 1 | 2 | x
 1 | 3 | x

and so on.

I don't know if there is any client library support for this, but it would
be useful.  But I think in Cassandra 2.0, CASSANDRA-4415 and CASSANDRA-4536
will solve this.

Richard.

On 16 August 2013 17:16, Jonathan Rhone jonat...@shareablee.com wrote:

 Read

 http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

 And look at


 http://fossies.org/dox/apache-cassandra-1.2.8-src/CqlPagingRecordReader_8java_source.html

 - Jon


 On Fri, Aug 16, 2013 at 12:08 PM, Keith Freeman 8fo...@gmail.com wrote:

 I've run into the same problem, surprised nobody's responded to you.  Any
 time someone asks how do I page through all the rows of a table in CQL3?,
 the standard answer is token() and limit.  But as you point out, this
 method will often miss some data from wide rows.

 Maybe a Cassandra expert will chime in if we're wrong.

 Your suggestion is possible if you know how to find the previous value of
 'name' field (and are willing to filter out repeated rows), but wouldn't
 that be difficult/impossible with some keys?  So then, is there a way to do
 paging queries that get ALL of the rows, even in wide rows?



 On 08/13/2013 02:46 PM, Jan Algermissen wrote:

 HI,

 ok, so I found token() [1], and that it is an option for paging through
 randomly partitioned data.

 I take it that combining token() and LIMIT is the CQL3 idiom for paging
 (set aside the fact that one shouldn't raelly want to page and use C*)

 Now, when I page through a CF with wide rows, limitting each 'page' to,
 for example, 100 I end up in situations where not all 'sub'rows that have
 the same result for token() are returned because LIMIT chops off the result
 after 100 'sub'rows, not neccessarily at the boundary to the next wide row.

 Obvious ... but inconvenient.

 The solution would be to throw away the last token returned (because
 it's wide row could have been chopped off) and do the next query with the
 token before.

 So instead of doing

   SELECT * FROM users WHERE token(name)  
 token(last-name-of-prev-**result)
 LIMIT 100;

 I'd be doing

  SELECT * FROM users WHERE token(name) 
 token(one-befoe-the-last-name-**of-prev-result) LIMIT 100;


 Question: Is that what I have to do or is there a way to make token()
 and limit work together to return complete wide rows?


 Jan



 [1] token() and how it relates to paging is actually quite hard to grasp
 from the docs.






Re: token(), limit and wide rows

2013-08-16 Thread Keith Freeman
I've run into the same problem, surprised nobody's responded to you.  
Any time someone asks how do I page through all the rows of a table in 
CQL3?, the standard answer is token() and limit.  But as you point out, 
this method will often miss some data from wide rows.


Maybe a Cassandra expert will chime in if we're wrong.

Your suggestion is possible if you know how to find the previous value 
of 'name' field (and are willing to filter out repeated rows), but 
wouldn't that be difficult/impossible with some keys?  So then, is there 
a way to do paging queries that get ALL of the rows, even in wide rows?



On 08/13/2013 02:46 PM, Jan Algermissen wrote:

HI,

ok, so I found token() [1], and that it is an option for paging through 
randomly partitioned data.

I take it that combining token() and LIMIT is the CQL3 idiom for paging (set 
aside the fact that one shouldn't raelly want to page and use C*)

Now, when I page through a CF with wide rows, limitting each 'page' to, for 
example, 100 I end up in situations where not all 'sub'rows that have the same 
result for token() are returned because LIMIT chops off the result after 100 
'sub'rows, not neccessarily at the boundary to the next wide row.

Obvious ... but inconvenient.

The solution would be to throw away the last token returned (because it's wide 
row could have been chopped off) and do the next query with the token before.

So instead of doing

  SELECT * FROM users WHERE token(name)  token(last-name-of-prev-result) 
LIMIT 100;

I'd be doing

 SELECT * FROM users WHERE token(name)  
token(one-befoe-the-last-name-of-prev-result) LIMIT 100;


Question: Is that what I have to do or is there a way to make token() and limit 
work together to return complete wide rows?


Jan



[1] token() and how it relates to paging is actually quite hard to grasp from 
the docs.




token(), limit and wide rows

2013-08-13 Thread Jan Algermissen
HI,

ok, so I found token() [1], and that it is an option for paging through 
randomly partitioned data. 

I take it that combining token() and LIMIT is the CQL3 idiom for paging (set 
aside the fact that one shouldn't raelly want to page and use C*)

Now, when I page through a CF with wide rows, limitting each 'page' to, for 
example, 100 I end up in situations where not all 'sub'rows that have the same 
result for token() are returned because LIMIT chops off the result after 100 
'sub'rows, not neccessarily at the boundary to the next wide row.

Obvious ... but inconvenient.

The solution would be to throw away the last token returned (because it's wide 
row could have been chopped off) and do the next query with the token before.

So instead of doing

 SELECT * FROM users WHERE token(name)  token(last-name-of-prev-result) 
LIMIT 100;

I'd be doing

SELECT * FROM users WHERE token(name)  
token(one-befoe-the-last-name-of-prev-result) LIMIT 100;


Question: Is that what I have to do or is there a way to make token() and limit 
work together to return complete wide rows?


Jan



[1] token() and how it relates to paging is actually quite hard to grasp from 
the docs.

Hadoop - using SlicePredicate with wide rows

2013-07-31 Thread Adam Masters
Hi all,

I need to limit a MapReduce job to only scan a specific range of columns.
The CF being processed is a wide row, so I've set the 'widerow' property in
ConfigHelper.setInputColumnFamily() to true.

However, in the word_count example on github, the following comment exists:

// this will cause the predicate to be ignored in favor of scanning
everything as a wide row
ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
COLUMN_FAMILY, true);

This suggests that ignoring the SlicePredicate for wide rows is by design -
and this is certainly the behavior I've been witnessing. In which case, how
do I limit the columns being scanned?

N.B. I cant set the 'widerow' flag to false as it breaks Cassandra (too
many columns are loaded at once, causing an outofmemory style exception).

Many thanks,
Adam


Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Boris Yen
Hi All,

Sorry for the wide distribution.

Our cassandra is running on 1.0.10. Recently, we are facing a weird
situation. We have a column family containing wide rows (each row might
have a few million of columns). We delete the columns on a daily basis and
we also run major compaction on it everyday to free up disk space (the
gc_grace is set to 600 seconds).

However, every time we run the major compaction, only 1 or 2GB disk space
is freed. We tried to delete most of the data before running compaction,
however, the result is pretty much the same.

So, we tried to check the source code. It seems that the column tombstones
could only be purged when the row key is not in other sstables. I know the
major compaction should include all sstables, however, in our use case,
columns get inserted rapidly. This will make the cassandra flush the
memtables to disk and create new sstables. The newly created sstables will
have the same keys as the sstables that are being compacted (the compaction
will take 2 or 3 hours to finish). My question is that will these newly
created sstables be the cause of why most of the column-tombstone not being
purged?

p.s. We also did some other tests. We inserted data to the same CF with the
same wide-row pattern and deleted most of the data. This time we stopped
all the writes to cassandra and did the compaction. The disk usage
decreased dramatically.

Any suggestions or is this a know issue.

Thanks and Regards,
Boris


Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Louvet, Jacques
Boris,

We hit exactly the same issue, and you are correct the newly created SSTables 
are the cause of why most of the column-tombstone not being purged.

There is an improvement in 1.2 train where both the minimum and maximum 
timestamp for a row is now stored and used during the compaction to determine 
if the portion of the row can be purged.
However, this only appears to help Major compaction has the other restriction 
where all the files encompassing the deleted rows must be part of the 
compaction for the row to be purged still remains.

We have switched to column delete rather that row delete wherever practical. A 
little more work on the app, but a big improvement in reads due to much more 
efficient compaction.

Regards,
Jacques

From: Boris Yen yulin...@gmail.commailto:yulin...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 16, 2013 04:07
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org, 
d...@cassandra.apache.orgmailto:d...@cassandra.apache.org 
d...@cassandra.apache.orgmailto:d...@cassandra.apache.org
Subject: Major compaction does not seems to free the disk space a lot if wide 
rows are used.

Hi All,

Sorry for the wide distribution.

Our cassandra is running on 1.0.10. Recently, we are facing a weird situation. 
We have a column family containing wide rows (each row might have a few million 
of columns). We delete the columns on a daily basis and we also run major 
compaction on it everyday to free up disk space (the gc_grace is set to 600 
seconds).

However, every time we run the major compaction, only 1 or 2GB disk space is 
freed. We tried to delete most of the data before running compaction, however, 
the result is pretty much the same.

So, we tried to check the source code. It seems that the column tombstones 
could only be purged when the row key is not in other sstables. I know the 
major compaction should include all sstables, however, in our use case, columns 
get inserted rapidly. This will make the cassandra flush the memtables to disk 
and create new sstables. The newly created sstables will have the same keys as 
the sstables that are being compacted (the compaction will take 2 or 3 hours to 
finish). My question is that will these newly created sstables be the cause of 
why most of the column-tombstone not being purged?

p.s. We also did some other tests. We inserted data to the same CF with the 
same wide-row pattern and deleted most of the data. This time we stopped all 
the writes to cassandra and did the compaction. The disk usage decreased 
dramatically.

Any suggestions or is this a know issue.

Thanks and Regards,
Boris


Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Edward Capriolo
This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.


On Thu, May 16, 2013 at 12:00 PM, Louvet, Jacques 
jacques_lou...@cable.comcast.com wrote:

  Boris,

  We hit exactly the same issue, and you are correct the newly created
 SSTables are the cause of why most of the column-tombstone not being purged.

  There is an improvement in 1.2 train where both the minimum and maximum
 timestamp for a row is now stored and used during the compaction to
 determine if the portion of the row can be purged.
 However, this only appears to help Major compaction has the other
 restriction where all the files encompassing the deleted rows must be part
 of the compaction for the row to be purged still remains.

  We have switched to column delete rather that row delete wherever
 practical. A little more work on the app, but a big improvement in reads
 due to much more efficient compaction.

  Regards,
 Jacques

   From: Boris Yen yulin...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, May 16, 2013 04:07
 To: user@cassandra.apache.org user@cassandra.apache.org, 
 d...@cassandra.apache.org d...@cassandra.apache.org
 Subject: Major compaction does not seems to free the disk space a lot if
 wide rows are used.

  Hi All,

 Sorry for the wide distribution.

  Our cassandra is running on 1.0.10. Recently, we are facing a weird
 situation. We have a column family containing wide rows (each row might
 have a few million of columns). We delete the columns on a daily basis and
 we also run major compaction on it everyday to free up disk space (the
 gc_grace is set to 600 seconds).

  However, every time we run the major compaction, only 1 or 2GB disk space
 is freed. We tried to delete most of the data before running compaction,
 however, the result is pretty much the same.

  So, we tried to check the source code. It seems that the column
 tombstones could only be purged when the row key is not in other sstables.
 I know the major compaction should include all sstables, however, in our
 use case, columns get inserted rapidly. This will make the cassandra flush
 the memtables to disk and create new sstables. The newly created sstables
 will have the same keys as the sstables that are being compacted (the
 compaction will take 2 or 3 hours to finish). My question is that will
 these newly created sstables be the cause of why most of the
 column-tombstone not being purged?

  p.s. We also did some other tests. We inserted data to the same CF with
 the same wide-row pattern and deleted most of the data. This time we
 stopped all the writes to cassandra and did the compaction. The disk usage
 decreased dramatically.

  Any suggestions or is this a know issue.

  Thanks and Regards,
  Boris



  1   2   >