date:20161003

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread INDRANIL BASU

@Dorain, yes i did that by mistake. I rectified it by starting a new thread.
  Thanks and regards,-- Indranil Basu

  From: Dorian Hoxha 
 To: user@cassandra.apache.org; INDRANIL BASU  
 Sent: Monday, 3 October 2016 11:07 PM
 Subject: Re: Way to write to dc1 but keep data only in dc2
   
@INDRANIL
Please go find your own thread and don't hijack mine.

On Mon, Oct 3, 2016 at 6:19 PM, INDRANIL BASU  wrote:

Hello All,

I am getting the below error repeatedly in the system log of C* 2.1.0

WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 
- Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_ 
col1_idx (see tombstone_warn_threshold). 5000 columns was requested, 
slices=[-], delInfo={deletedAt=- 9223372036854775808, localDeletion=2147483647}
After that NullPointer Exception and finally OOM
ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
6287,1,main]
java.lang. NullPointerException: null
    at org.apache.cassandra.service. CacheService$ 
KeyCacheSerializer.serialize( CacheService.java:475) 
~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.service. CacheService$ 
KeyCacheSerializer.serialize( CacheService.java:463) 
~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.cache. AutoSavingCache$Writer. 
saveCache(AutoSavingCache. java:225) ~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.db. compaction.CompactionManager$ 
11.run(CompactionManager.java: 1061) ~[apache-cassandra-2.1.0.jar: 2.1.0]
    at java.util.concurrent. Executors$RunnableAdapter. call(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent. FutureTask.run(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown Source) 
[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown Source) 
[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
9712,1,main]
java.lang. NullPointerException: null
ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
10070,1,main]
java.lang. NullPointerException: null
ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
10413,1,main]
java.lang. NullPointerException: null
ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425 
CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter: 
2396,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
    at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.addWorker( Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor. processWorkerExit(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown Source) 
~[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
-- IB

Re: Replacing a dead node in a live Cassandra Cluster

2016-10-03 Thread Yabin Meng

Are you sure cassandra.yaml file of the new node is correctly configured?
What is your seeds and listen_address setup of your new node and existing
nodes?

Yabin

On Fri, Sep 30, 2016 at 7:56 PM, Rajath Subramanyam 
wrote:

> Hello Cassandra-users,
>
> I was running some tests today. My end goal was to learn more about
> replacing a dead node in a live Cassandra cluster with minimal disruption
> to the existing cluster and figure out a better and faster way of doing the
> same.
>
> I am running a package installation of the following version of Cassandra.
>
> [centos@rj-cassandra-1 testcf-97896450869d11e6a84c4381bf5c5035]$ nodetool
> version
> ReleaseVersion: 2.1.12
>
> I setup a 4 node Cassandra in the lab. I got one non-seed node (lets say
> node1) down by issuing a 'sudo service cassandra stop'. Then following
> following instructions from this link
> ,
> I tried to replace node1 with the JMX option 
> -Dcassandra.replace_address=.
> However, when I do this the bootstrap fails with the following error in the
> log:
>
> ERROR [main] 2016-09-30 23:54:17,104 CassandraDaemon.java:579 - Exception
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1337)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:512)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:721)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
> ~[apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
> [apache-cassandra-2.1.12.jar:2.1.12]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> [apache-cassandra-2.1.12.jar:2.1.12]
> at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
> [apache-cassandra-2.1.12.jar:2.1.12]
> WARN  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
> Gossiper.java:1454 - No local state or state is in silent shutdown, not
> announcing shutdown
> INFO  [StorageServiceShutdownHook] 2016-09-30 23:54:17,109
> MessagingService.java:734 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/10.7.0.232] 2016-09-30 23:54:17,110
> MessagingService.java:1018 - MessagingService has terminated the accept()
> thread
>
> How do I recover from this error message ?
>
> 
> Rajath Subramanyam
>
>

Re: Cassandra 3 node cluster with intermittent network issues on one node

2016-10-03 Thread Yabin Meng

Most likely node A has some gossip related problems. You can try purging
the gossip state on node A, as per the procedure:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
.

Yabin

On Mon, Oct 3, 2016 at 2:38 AM, Girish Kamarthi <
girish.kamar...@stellapps.com> wrote:

> Hi All,
>
> I want to test out a scenario where there is intermittent network issues
> on one of the node.
>
> I've got Cassandra 3.7 cluster of 3 nodes with the keyspace replication
> factor of 3.
>
> All the 3 nodes(node A, node B, node C) are started and are in sync. When
> one of the cassandra node went down (node A), I restarted cassandra, the
> node A gets in sync with the other nodes B & C.
>
> Now my question is when one of the node has issues like intermittent
> network issues (cassandra is still up and running). Say node A is having
> network issues, the nodetool status on the other 2 nodes b & C shows that
> the node A is down.
>
> *Debug.log of Node B & C:*
>
> DEBUG [GossipTasks:1] 2016-10-03 11:46:18,922 Gossiper.java:337 -
> Convicting /10.1.1.4 with status NORMAL - alive false
>
> When the network is back on the node A the nodetool status shows that the
> other nodes are down.
>
> *Debug.log of Node A:*
>
> DEBUG [GossipTasks:1] 2016-10-03 11:47:23,613 Gossiper.java:337 -
> Convicting /10.1.1.5 with status NORMAL - alive false
>
> DEBUG [GossipTasks:1] 2016-10-03 11:47:23,614 Gossiper.java:337 -
> Convicting /10.1.1.6 with status NORMAL - alive false
>
>
> Below are the configuration changes I made in the cassandra.yaml files.
>
> Node 01
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.4
>
> broadcast_address: 10.1.1.4
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.4
>
>
> Node02
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.5
>
> broadcast_address: 10.1.1.5
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.5
>
>
> Node03
>
> cluster_name: 'Test Cluster'
>
> num_tokens: 256
>
> seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>
> parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"
>
> listen_address: 10.1.1.6
>
> broadcast_address: 10.1.1.6
>
> rpc_address: 0.0.0.0
>
> broadcast_rpc_address: 10.1.1.6
>
>
> Nodetool status on node A when the network is up shows that the other
> nodes are down (DN).
>
> Nodetool status on the other nodes B & C shows that the node 1 is down (DN)
>
> How does the handshaking works in this scenario?
>
> Why the node A is not in sync with the other nodes when the network is up?
>
> Please give me some inputs on resolving this issue.
>
> Thanks & Regards,
> Girish Kumar Kamarthi
> +91-9986427891
>

Re: cassandra dump file path

2016-10-03 Thread Yabin Meng

Have you restarted Cassandra after making changes in cassandra-env.sh?

Yabin

On Mon, Oct 3, 2016 at 7:44 AM, Jean Carlo 
wrote:

> OK I got the response to one of my questions. In the script
> /etc/init.d/cassandra we set the path for the heap dump by default in the
> cassandra_home.
>
> Now the thing I don't understand is, why do the dumps are located by the
> file set by /etc/init.d/cassandra and not by the  conf file
> cassandra-env.sh?
>
> Anyone any idea?
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Mon, Oct 3, 2016 at 12:00 PM, Jean Carlo 
> wrote:
>
>>
>> Hi
>>
>> I see in the log of my node cassandra that the parameter -XX:HeapDumpPath
>> is charged two times.
>>
>> INFO  [main] 2016-10-03 04:21:29,941 CassandraDaemon.java:205 - JVM
>> Arguments: [-ea, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar,
>> -XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities,
>> -XX:ThreadPriorityPolicy=42, -Xms6G, -Xmx6G, -Xmn600M, 
>> *-XX:+HeapDumpOnOutOfMemoryError,
>> -XX:HeapDumpPath=/cassandra/dumps/cassandra-1475461287-pid34435.hprof*,
>> -Xss256k, -XX:StringTableSize=103, -XX:+UseParNewGC,
>> -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled,
>> -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1,
>> -XX:CMSInitiatingOccupancyFraction=30, -XX:+UseCMSInitiatingOccupancyOnly,
>> -XX:+UseTLAB, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler,
>> -XX:CMSWaitDuration=1, -XX:+CMSParallelInitialMarkEnabled,
>> -XX:+CMSEdenChunksRecordAlways, -XX:CMSWaitDuration=1,
>> -XX:+UseCondCardMark, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps,
>> -XX:+PrintGCApplicationStoppedTime, 
>> -Xloggc:/var/opt/hosting/log/cassandra/gc.log,
>> -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=20,
>> -XX:GCLogFileSize=20M, -Djava.net.preferIPv4Stack=true,
>> -Dcom.sun.management.jmxremote.port=7199, 
>> -Dcom.sun.management.jmxremote.rmi.port=7199,
>> -Dcom.sun.management.jmxremote.ssl=false, 
>> -Dcom.sun.management.jmxremote.authenticate=false,
>> -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password,
>> -Djava.io.tmpdir=/var/opt/hosting/db/cassandra/tmp,
>> -javaagent:/usr/share/cassandra/lib/jolokia-jvm-1.0.6-agent.jar=port=8778,host=0.0.0.0,
>> -Dcassandra.auth_bcrypt_gensalt_log2_rounds=4,
>> -Dlogback.configurationFile=logback.xml, 
>> -Dcassandra.logdir=/var/log/cassandra,
>> -Dcassandra.storagedir=, 
>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid,
>> *-XX:HeapDumpPath=/var/lib/cassandra/java_1475461286.hprof*,
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1475461286.log]
>>
>> This option is defined in cassandra-env.sh
>>
>> if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
>> JVM_OPTS="$JVM_OPTS 
>> -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date
>> +%s`-pid$$.hprof"
>> fi
>>  and we defined before the value of CASSANDRA_HEAPDUMP_DIR before to
>>
>>
>> */cassandra/dumps/*
>> It is seems that cassandra does not care about the conf in
>> cassandra-env.sh and he only takes in account the last set for HeapDumpPath
>>
>> */var/lib/cassandra/java_1475461286.hprof*
>> This causes problems when we have to dump the heap because cassandra uses
>> the disk not suitable to do it.
>>
>> Is  *XX:HeapDumpPath *set in another place/file that I dont know?
>>
>> Thxs
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>
>

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Yabin Meng

Dorian, I don't think Cassandra is able to achieve what you want natively.
In short words, what you want to achieve is conditional data replication.

Yabin



On Mon, Oct 3, 2016 at 1:37 PM, Dorian Hoxha  wrote:

> @INDRANIL
> Please go find your own thread and don't hijack mine.
>
> On Mon, Oct 3, 2016 at 6:19 PM, INDRANIL BASU 
> wrote:
>
>> Hello All,
>>
>> I am getting the below error repeatedly in the system log of C* 2.1.0
>>
>> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
>> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
>> test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold).
>> 5000 columns was requested, slices=[-], 
>> delInfo={deletedAt=-9223372036854775808,
>> localDeletion=2147483647}
>>
>> After that NullPointer Exception and finally OOM
>>
>> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:6287,1,main]
>> java.lang.NullPointerException: null
>> at org.apache.cassandra.service.CacheService$KeyCacheSerializer
>> .serialize(CacheService.java:475) ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at org.apache.cassandra.service.CacheService$KeyCacheSerializer
>> .serialize(CacheService.java:463) ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at 
>> org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
>> ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at 
>> org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
>> ~[apache-cassandra-2.1.0.jar:2.1.0]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.FutureTask.run(Unknown Source)
>> ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source) [na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source) [na:1.7.0_80]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
>> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:9712,1,main]
>> java.lang.NullPointerException: null
>> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:10070,1,main]
>> java.lang.NullPointerException: null
>> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[CompactionExecutor:10413,1,main]
>> java.lang.NullPointerException: null
>> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
>> CassandraDaemon.java:166 - Exception in thread
>> Thread[MemtableFlushWriter:2396,5,main]
>> java.lang.OutOfMemoryError: unable to create new native thread
>> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
>> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source) ~[na:1.7.0_80]
>> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>>
>> -- IB
>>
>>
>>
>

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa

That’s true for versions 2.1 and newer. However, it’s possible that 3.0 engine 
rewrite introduced a bug or two that haven’t yet been found. 

From: Hannu Kröger 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 3:52 PM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

If I remember correctly row cache caches only N rows from the beginning of the 
partition. N being some configurable number. 

See this link which is suggesting that:

http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

Br,

Hannu

On 4 Oct 2016, at 1.32, Edward Capriolo  wrote:

Since the feature is off by default. The coverage might could be only as deep 
as the specific tests that test it.

On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa  wrote:

Seems like it’s probably worth opening a jira issue to track it (either to 
confirm it’s a bug, or to be able to better explain if/that it’s working as 
intended – the row cache is probably missing because trace indicates the read 
isn’t cacheable, but I suspect it should be cacheable). 

Do note, though, that setting rows_per_partition to ALL can be very very very 
dangerous if you have very wide rows in any of your tables with row cache 
enabled.

From: Abhinav Solan 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 1:38 PM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

It's cassandra 3.0.7,  

I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then only 
it works don't know why.

If I set 'rows_per_partition':'1' then it does not work.

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
cache would be refreshed automatically or it would be lazy, whenever the fetch 
call is made then only it caches it.

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa  wrote:

Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 

From: Abhinav Solan 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

Hi, can anyone please help me with this 

Thanks,

Abhinav

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan  wrote:

Hi Everyone, 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

with tracing on every time it says Row cache miss

activity

  | timestamp  | source  | source_elapsed

---++-+

Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146; 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  111

Re: Row cache not working

2016-10-03 Thread Hannu Kröger

If I remember correctly row cache caches only N rows from the beginning of the 
partition. N being some configurable number. 

See this link which is suggesting that:
http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

Br,
Hannu

> On 4 Oct 2016, at 1.32, Edward Capriolo  wrote:
> 
> Since the feature is off by default. The coverage might could be only as deep 
> as the specific tests that test it.
> 
>> On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa  
>> wrote:
>> Seems like it’s probably worth opening a jira issue to track it (either to 
>> confirm it’s a bug, or to be able to better explain if/that it’s working as 
>> intended – the row cache is probably missing because trace indicates the 
>> read isn’t cacheable, but I suspect it should be cacheable).
>> 
>>  
>>  
>>  
>> 
>> 
>> Do note, though, that setting rows_per_partition to ALL can be very very 
>> very dangerous if you have very wide rows in any of your tables with row 
>> cache enabled.
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> From: Abhinav Solan 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 3, 2016 at 1:38 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> It's cassandra 3.0.7, 
>> 
>> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then 
>> only it works don't know why.
>> 
>> If I set 'rows_per_partition':'1' then it does not work.
>> 
>>  
>> 
>> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
>> cache would be refreshed automatically or it would be lazy, whenever the 
>> fetch call is made then only it caches it.
>> 
>>  
>> 
>> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa  wrote:
>> 
>> Which version of Cassandra are you running (I can tell it’s newer than 2.1, 
>> but exact version would be useful)?
>> 
>>  
>> 
>> From: Abhinav Solan 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 3, 2016 at 11:35 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Row cache not working
>> 
>>  
>> 
>> Hi, can anyone please help me with this
>> 
>>  
>> 
>> Thanks,
>> 
>> Abhinav
>> 
>>  
>> 
>> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan  
>> wrote:
>> 
>> Hi Everyone,
>> 
>>  
>> 
>> My table looks like this -
>> 
>> CREATE TABLE test.reads (
>> 
>> svc_pt_id bigint,
>> 
>> meas_type_id bigint,
>> 
>> flags bigint,
>> 
>> read_time timestamp,
>> 
>> value double,
>> 
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> 
>> ) WITH bloom_filter_fp_chance = 0.1
>> 
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> 
>> AND comment = ''
>> 
>> AND compaction = {'class': 
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> 
>> AND compression = {'chunk_length_in_kb': '64', 'class': 
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> 
>> AND crc_check_chance = 1.0
>> 
>> AND dclocal_read_repair_chance = 0.1
>> 
>> AND default_time_to_live = 0
>> 
>> AND gc_grace_seconds = 864000
>> 
>> AND max_index_interval = 2048
>> 
>> AND memtable_flush_period_in_ms = 0
>> 
>> AND min_index_interval = 128
>> 
>> AND read_repair_chance = 0.0
>> 
>> AND speculative_retry = '99PERCENTILE';
>> 
>>  
>> 
>> Have set up the C* nodes with
>> 
>> row_cache_size_in_mb: 1024
>> 
>> row_cache_save_period: 14400
>> 
>>  
>> 
>> and I am making this query 
>> 
>> select svc_pt_id, meas_type_id, read_time, value FROM 
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;
>> 
>>  
>> 
>> with tracing on every time it says Row cache miss
>> 
>>  
>> 
>> activity 
>>  
>> | timestamp  | source  | source_elapsed
>> 
>> ---++-+
>> 
>>  
>>Execute 
>> CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0
>> 
>>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 
>> 146; [SharedPool-Worker-1] |

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith

I did not ascribe blame.  I only empathised with their predicament;  I
don't want to listen to either of us, either!





On 3 October 2016 at 19:45, Edward Capriolo  wrote:

> You know what don't "go low" and suggest the recent un-subscriber on me.
>
> If your so eager to deal with my pull request please review this one:
> I would rather you review this pull request: https://issues.
> apache.org/jira/browse/CASSANDRA-10825
>
>
>
>
>
> On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Nobody is disputing that the docs can and should be improved to avoid
>> this misreading.  I've invited Ed to file a JIRA and/or pull request twice
>> now.
>>
>> You are of course just as welcome to do this.  Perhaps you will actually
>> do it, so we can all move on with our lives!
>>
>>
>>
>>
>> On 3 October 2016 at 17:45, Peter Lin  wrote:
>>
>>> I've met clients that read the cassandra docs and then said in a big
>>> meeting "it's just like relational database, it has tables just like
>>> sqlserver/oracle."
>>>
>>> I'm not putting words in other people's mouth either, but I've heard
>>> that said enough times to want to puke. Does the docs claim cassandra is
>>> relational ? it absolutely doesn't make that claim, but the docs play
>>> loosey goosey with terminology. End result is it confuses new users that
>>> aren't experts, or technology managers that try to make a case for
>>> cassandra.
>>>
>>> we can make all the excuses we want, but that doesn't change the fact
>>> the docs aren't user friendly. writing great documentation is tough and
>>> most developers hate it. It's cuz we suck at it. There I said it, "we SUCK
>>> as writing user friendly documentation". As many people have pointed out,
>>> it's not unique to Cassandra. 80% of the tech docs out there suck, starting
>>> with IBM at the top.
>>>
>>> Saying the docs suck isn't an indictment of anyone, it's just the
>>> reality of writing good documentation.
>>>
>>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
>>> wrote:
>>>
 Nobody is claiming Cassandra is a relational I'm not sure why that
 keeps coming up.
 On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
 wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words
> include "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse
> fields as well. MySQL can be backed by rocksdb now, does that make it not 
> a
> row store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a
> result proven in his seminal work on the relational model, equates the
> expressive power of relational algebra
>  and relational
> calculus  (both of
> which, lacking recursion, are strictly less powerful thanfirst-order
> logic ).[*citation
> needed *]
>
> As the relational model started to become fashionable in the early
> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
> misused by database vendors who had merely added a relational veneer to
> older technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as
> well. I do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
> and present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log

Re: Row cache not working

2016-10-03 Thread Edward Capriolo

Since the feature is off by default. The coverage might could be only as
deep as the specific tests that test it.

On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa 
wrote:

> Seems like it’s probably worth opening a jira issue to track it (either to
> confirm it’s a bug, or to be able to better explain if/that it’s working as
> intended – the row cache is probably missing because trace indicates the
> read isn’t cacheable, but I suspect it should be cacheable).
>
>
>
>
>
>
> Do note, though, that setting rows_per_partition to ALL can be very very
> very dangerous if you have very wide rows in any of your tables with row
> cache enabled.
>
>
>
>
>
>
>
> *From: *Abhinav Solan 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 3, 2016 at 1:38 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Row cache not working
>
>
>
> It's cassandra 3.0.7,
>
> I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then
> only it works don't know why.
>
> If I set 'rows_per_partition':'1' then it does not work.
>
>
>
> Also wanted to ask one thing, if I set row_cache_save_period: 60 then this
> cache would be refreshed automatically or it would be lazy, whenever the
> fetch call is made then only it caches it.
>
>
>
> On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa 
> wrote:
>
> Which version of Cassandra are you running (I can tell it’s newer than
> 2.1, but exact version would be useful)?
>
>
>
> *From: *Abhinav Solan 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 3, 2016 at 11:35 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Row cache not working
>
>
>
> Hi, can anyone please help me with this
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
> wrote:
>
> Hi Everyone,
>
>
>
> My table looks like this -
>
> CREATE TABLE test.reads (
>
> svc_pt_id bigint,
>
> meas_type_id bigint,
>
> flags bigint,
>
> read_time timestamp,
>
> value double,
>
> PRIMARY KEY ((svc_pt_id, meas_type_id))
>
> ) WITH bloom_filter_fp_chance = 0.1
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>
> AND comment = ''
>
> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
> LeveledCompactionStrategy'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Have set up the C* nodes with
>
> row_cache_size_in_mb: 1024
>
> row_cache_save_period: 14400
>
>
>
> and I am making this query
>
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
>
>
> with tracing on every time it says Row cache miss
>
>
>
> activity
>
>| timestamp  | source  | source_elapsed
>
> 
> 
> ---+
> +-+
>
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
>
> reading data from /192.168.170.186
> 
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>
>Sending READ message to /192.168.170.186
> 
> [MessagingService-Outgoing-/192.168.170.186
>

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa

Seems like it’s probably worth opening a jira issue to track it (either to 
confirm it’s a bug, or to be able to better explain if/that it’s working as 
intended – the row cache is probably missing because trace indicates the read 
isn’t cacheable, but I suspect it should be cacheable). 



    

Do note, though, that setting rows_per_partition to ALL can be very very very 
dangerous if you have very wide rows in any of your tables with row cache 
enabled.

 

 

 

From: Abhinav Solan 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 1:38 PM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

 

It's cassandra 3.0.7,  

I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then only 
it works don't know why.

If I set 'rows_per_partition':'1' then it does not work.

 

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this 
cache would be refreshed automatically or it would be lazy, whenever the fetch 
call is made then only it caches it.

 

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa  wrote:

Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 

 

From: Abhinav Solan 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

 

Hi, can anyone please help me with this 

 

Thanks,

Abhinav

 

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan  wrote:

Hi Everyone, 

 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

 

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

 

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

 

with tracing on every time it says Row cache miss

 

activity

  | timestamp  | source  | source_elapsed

---++-+


Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146; 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  111


 Preparing statement 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  209


  reading data from /192.168.170.186 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |  
  370

 
Sending READ message to /192.168.170.186 
[MessagingService-Outgoing-/192.168.170.186] | 2016-09-30 18:15:00.446001 |  
192.168.199.75 |450

  REQUEST_RESPONSE 
message received from /192.168.170.186 
[MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000 |  
192.168.199.75 |   2469

Re: Row cache not working

2016-10-03 Thread Abhinav Solan

It's cassandra 3.0.7,
I had to set caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}, then
only it works don't know why.
If I set 'rows_per_partition':'1' then it does not work.

Also wanted to ask one thing, if I set row_cache_save_period: 60 then this
cache would be refreshed automatically or it would be lazy, whenever the
fetch call is made then only it caches it.

On Mon, Oct 3, 2016 at 1:31 PM Jeff Jirsa 
wrote:

> Which version of Cassandra are you running (I can tell it’s newer than
> 2.1, but exact version would be useful)?
>
>
>
> *From: *Abhinav Solan 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 3, 2016 at 11:35 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Row cache not working
>
>
>
> Hi, can anyone please help me with this
>
>
>
> Thanks,
>
> Abhinav
>
>
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
> wrote:
>
> Hi Everyone,
>
>
>
> My table looks like this -
>
> CREATE TABLE test.reads (
>
> svc_pt_id bigint,
>
> meas_type_id bigint,
>
> flags bigint,
>
> read_time timestamp,
>
> value double,
>
> PRIMARY KEY ((svc_pt_id, meas_type_id))
>
> ) WITH bloom_filter_fp_chance = 0.1
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>
> AND comment = ''
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.1
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Have set up the C* nodes with
>
> row_cache_size_in_mb: 1024
>
> row_cache_save_period: 14400
>
>
>
> and I am making this query
>
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
>
>
> with tracing on every time it says Row cache miss
>
>
>
> activity
>
>| timestamp  | source  | source_elapsed
>
>
> ---++-+
>
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
>
> reading data from /192.168.170.186
> 
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>
>Sending READ message to /192.168.170.186
> 
> [MessagingService-Outgoing-/192.168.170.186
> ]
> | 2016-09-30 18:15:00.446001 |  192.168.199.75 |450
>
>
> REQUEST_RESPONSE message received from /192.168.170.186
> 
> [MessagingService-Incoming-/192.168.170.186
> ]
> | 2016-09-30 18:15:00.448000 |  192.168.199.75 |   2469
>
>
>  Processing response from /192.168.170.186
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Edward Capriolo

I undertook a similar effort a while ago.

https://issues.apache.org/jira/browse/CASSANDRA-7014

Other than the fact that it was closed with no comments, I can tell you
that other efforts I had to embed things in Cassandra did not go
swimmingly. Although at the time ideas were rejected like groovy udfs

On Mon, Oct 3, 2016 at 4:22 PM, Bhuvan Rawal  wrote:

> Hi Jonathan,
>
> If full scan is a regular requirement then setting up a spark cluster in
> locality with Cassandra nodes makes perfect sense. But supposing that it is
> a one off requirement, say a weekly or a fortnightly task, a spark cluster
> could be an added overhead with additional capacity, resource planning as
> far as operations / maintenance is concerned.
>
> So this could be thought a simple substitute for a single threaded scan
> without additional efforts to setup and maintain another technology.
>
> Regards,
> Bhuvan
>
> On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi Jon,
>> It wan't allowed.
>> Moreover, if someone who isn't familiar with spark, and might be new to
>> map filter reduce etc. operations, could also use the utility for some
>> simple operations assuming a sequential scan of the cassandra table.
>>
>> Regards
>> Siddharth Verma
>>
>> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad 
>> wrote:
>>
>>> Couldn't set up as couldn't get it working, or its not allowed?
>>>
>>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
>>> verma.siddha...@snapdeal.com> wrote:
>>>
 Hi Jon,
 We couldn't setup a spark cluster.

 For some use case, a spark cluster was required, but for some reason we
 couldn't create spark cluster. Hence, one may use this utility to iterate
 through the entire table at very high speed.

 Had to find a work around, that would be faster than paging on result
 set.

 Regards

 Siddharth Verma
 *Software Engineer I - CaMS*
 *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
 CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
 Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
 Download Our App
 [image: A]

  [image:
 A]

  [image:
 W]

 On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
 wrote:

 It almost sounds like you're duplicating all the work of both spark and
 the connector. May I ask why you decided to not use the existing tools?

 On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
 sidd.verma29.l...@gmail.com> wrote:

 Hi DuyHai,
 Thanks for your reply.
 A few more features planned in the next one(if there is one) like,
 custom policy keeping in mind the replication of token range on
 specific nodes,
 fine graining the token range(for more speedup),
 and a few more.

 I think, as fine graining a token range,
 If one token range is split further in say, 2-3 parts, divided among
 threads, this would exploit the possible parallelism on a large scaled out
 cluster.

 And, as you mentioned the JIRA, streaming of request, that would of
 huge help with further splitting the range.

 Thanks once again for your valuable comments. :-)

 Regards,
 Siddharth Verma

>>
>

Re: Row cache not working

2016-10-03 Thread Jeff Jirsa

Which version of Cassandra are you running (I can tell it’s newer than 2.1, but 
exact version would be useful)? 


 

From: Abhinav Solan 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 3, 2016 at 11:35 AM
To: "user@cassandra.apache.org" 
Subject: Re: Row cache not working

 

Hi, can anyone please help me with this 

 

Thanks,

Abhinav

 

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan  wrote:

Hi Everyone, 

 

My table looks like this -

CREATE TABLE test.reads (

svc_pt_id bigint,

meas_type_id bigint,

flags bigint,

read_time timestamp,

value double,

PRIMARY KEY ((svc_pt_id, meas_type_id))

) WITH bloom_filter_fp_chance = 0.1

AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

 

Have set up the C* nodes with

row_cache_size_in_mb: 1024

row_cache_save_period: 14400

 

and I am making this query 

select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146;

 

with tracing on every time it says Row cache miss

 

activity

  | timestamp  | source  | source_elapsed

---++-+


Execute CQL3 
query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  0

 Parsing select svc_pt_id, meas_type_id, read_time, value FROM 
cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id = 146; 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  111


 Preparing statement 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |  
  209


  reading data from /192.168.170.186 
[SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |  
  370

 
Sending READ message to /192.168.170.186 
[MessagingService-Outgoing-/192.168.170.186] | 2016-09-30 18:15:00.446001 |  
192.168.199.75 |450

  REQUEST_RESPONSE 
message received from /192.168.170.186 
[MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000 |  
192.168.199.75 |   2469


   Processing response from /192.168.170.186 
[SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |  
 2609


READ message received from /192.168.199.75 
[MessagingService-Incoming-/192.168.199.75] | 2016-09-30 18:15:00.449000 | 
192.168.170.186 | 75


  Row cache miss 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  218

  Fetching data but not populating 
cache as query does not query from the start of the partition 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  246

  
Executing single-partition query on cts_svc_pt_latest_int_read 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |  
  259


Acquiring sstable references 
[SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |  
  281

Re: Row cache not working

2016-10-03 Thread Edward Capriolo

I was thinking about this issue. I was wondering on the dev side if it
would make sense to make a utility for the unit tests that could enable
tracing and then assert that a number of steps in the trace happened.

Something like:

setup()
runQuery("SELECT * FROM X")
Assertion.assertTrace("Preparing statement").then("Row cache
hit").then("Request complete");

This would be a pretty awesome way to verify things without mock/mockito.



On Mon, Oct 3, 2016 at 2:35 PM, Abhinav Solan 
wrote:

> Hi, can anyone please help me with this
>
> Thanks,
> Abhinav
>
> On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
> wrote:
>
>> Hi Everyone,
>>
>> My table looks like this -
>> CREATE TABLE test.reads (
>> svc_pt_id bigint,
>> meas_type_id bigint,
>> flags bigint,
>> read_time timestamp,
>> value double,
>> PRIMARY KEY ((svc_pt_id, meas_type_id))
>> ) WITH bloom_filter_fp_chance = 0.1
>> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
>> AND comment = ''
>> AND compaction = {'class': 'org.apache.cassandra.db.compaction.
>> LeveledCompactionStrategy'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> Have set up the C* nodes with
>> row_cache_size_in_mb: 1024
>> row_cache_save_period: 14400
>>
>> and I am making this query
>> select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146;
>>
>> with tracing on every time it says Row cache miss
>>
>> activity
>>
>>  | timestamp  | source  | source_elapsed
>> 
>> 
>> ---+
>> +-+
>>
>>
>> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>  0
>>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
>> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
>> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>111
>>
>>Preparing statement
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>>209
>>
>> reading data from /192.168.170.186
>> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>370
>>
>>Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
>> 192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>>  450
>>
>> REQUEST_RESPONSE message received from /192.168.170.186
>> [MessagingService-Incoming-/192.168.170.186] | 2016-09-30
>> 18:15:00.448000 |  192.168.199.75 |   2469
>>
>>  Processing response from /192.168.170.186
>> [SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
>>   2609
>>
>>   READ message received from /192.168.199.75 [MessagingService-Incoming-/
>> 192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>> 75
>>
>> Row cache miss
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>218
>>   Fetching data but not
>> populating cache as query does not query from the start of the partition
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>246
>>
>> Executing single-partition query on cts_svc_pt_latest_int_read
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>>259
>>
>>   Acquiring sstable references
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>281
>>
>>  Merging memtable contents
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>295
>>
>>Merging data from sstable 8
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>326
>>
>>Key cache hit for sstable 8
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>351
>>
>>Merging data from sstable 7
>> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>>439
>>
>>Key

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal

Hi Jonathan,

If full scan is a regular requirement then setting up a spark cluster in
locality with Cassandra nodes makes perfect sense. But supposing that it is
a one off requirement, say a weekly or a fortnightly task, a spark cluster
could be an added overhead with additional capacity, resource planning as
far as operations / maintenance is concerned.

So this could be thought a simple substitute for a single threaded scan
without additional efforts to setup and maintain another technology.

Regards,
Bhuvan

On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma  wrote:

> Hi Jon,
> It wan't allowed.
> Moreover, if someone who isn't familiar with spark, and might be new to
> map filter reduce etc. operations, could also use the utility for some
> simple operations assuming a sequential scan of the cassandra table.
>
> Regards
> Siddharth Verma
>
> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad  wrote:
>
>> Couldn't set up as couldn't get it working, or its not allowed?
>>
>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
>> verma.siddha...@snapdeal.com> wrote:
>>
>>> Hi Jon,
>>> We couldn't setup a spark cluster.
>>>
>>> For some use case, a spark cluster was required, but for some reason we
>>> couldn't create spark cluster. Hence, one may use this utility to iterate
>>> through the entire table at very high speed.
>>>
>>> Had to find a work around, that would be faster than paging on result
>>> set.
>>>
>>> Regards
>>>
>>> Siddharth Verma
>>> *Software Engineer I - CaMS*
>>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
>>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
>>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
>>> Download Our App
>>> [image: A]
>>> 
>>>  [image:
>>> A]
>>> 
>>>  [image:
>>> W]
>>> 
>>>
>>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
>>> wrote:
>>>
>>> It almost sounds like you're duplicating all the work of both spark and
>>> the connector. May I ask why you decided to not use the existing tools?
>>>
>>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
>>> sidd.verma29.l...@gmail.com> wrote:
>>>
>>> Hi DuyHai,
>>> Thanks for your reply.
>>> A few more features planned in the next one(if there is one) like,
>>> custom policy keeping in mind the replication of token range on specific
>>> nodes,
>>> fine graining the token range(for more speedup),
>>> and a few more.
>>>
>>> I think, as fine graining a token range,
>>> If one token range is split further in say, 2-3 parts, divided among
>>> threads, this would exploit the possible parallelism on a large scaled out
>>> cluster.
>>>
>>> And, as you mentioned the JIRA, streaming of request, that would of huge
>>> help with further splitting the range.
>>>
>>> Thanks once again for your valuable comments. :-)
>>>
>>> Regards,
>>> Siddharth Verma
>>>
>>>
>>>
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma

Hi Jon,
It wan't allowed.
Moreover, if someone who isn't familiar with spark, and might be new to map
filter reduce etc. operations, could also use the utility for some simple
operations assuming a sequential scan of the cassandra table.

Regards
Siddharth Verma

On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad  wrote:

> Couldn't set up as couldn't get it working, or its not allowed?
>
> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi Jon,
>> We couldn't setup a spark cluster.
>>
>> For some use case, a spark cluster was required, but for some reason we
>> couldn't create spark cluster. Hence, one may use this utility to iterate
>> through the entire table at very high speed.
>>
>> Had to find a work around, that would be faster than paging on result set.
>>
>> Regards
>>
>> Siddharth Verma
>> *Software Engineer I - CaMS*
>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
>> Download Our App
>> [image: A]
>> 
>>  [image:
>> A]
>> 
>>  [image:
>> W]
>> 
>>
>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
>> wrote:
>>
>> It almost sounds like you're duplicating all the work of both spark and
>> the connector. May I ask why you decided to not use the existing tools?
>>
>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
>> sidd.verma29.l...@gmail.com> wrote:
>>
>> Hi DuyHai,
>> Thanks for your reply.
>> A few more features planned in the next one(if there is one) like,
>> custom policy keeping in mind the replication of token range on specific
>> nodes,
>> fine graining the token range(for more speedup),
>> and a few more.
>>
>> I think, as fine graining a token range,
>> If one token range is split further in say, 2-3 parts, divided among
>> threads, this would exploit the possible parallelism on a large scaled out
>> cluster.
>>
>> And, as you mentioned the JIRA, streaming of request, that would of huge
>> help with further splitting the range.
>>
>> Thanks once again for your valuable comments. :-)
>>
>> Regards,
>> Siddharth Verma
>>
>>
>>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Jonathan Haddad

Couldn't set up as couldn't get it working, or its not allowed?
On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma 
wrote:

> Hi Jon,
> We couldn't setup a spark cluster.
>
> For some use case, a spark cluster was required, but for some reason we
> couldn't create spark cluster. Hence, one may use this utility to iterate
> through the entire table at very high speed.
>
> Had to find a work around, that would be faster than paging on result set.
>
> Regards
>
> Siddharth Verma
> *Software Engineer I - CaMS*
> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
> Download Our App
> [image: A]
> 
>  [image:
> A]
> 
>  [image:
> W]
> 
>
> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
> wrote:
>
> It almost sounds like you're duplicating all the work of both spark and
> the connector. May I ask why you decided to not use the existing tools?
>
> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
> Hi DuyHai,
> Thanks for your reply.
> A few more features planned in the next one(if there is one) like,
> custom policy keeping in mind the replication of token range on specific
> nodes,
> fine graining the token range(for more speedup),
> and a few more.
>
> I think, as fine graining a token range,
> If one token range is split further in say, 2-3 parts, divided among
> threads, this would exploit the possible parallelism on a large scaled out
> cluster.
>
> And, as you mentioned the JIRA, streaming of request, that would of huge
> help with further splitting the range.
>
> Thanks once again for your valuable comments. :-)
>
> Regards,
> Siddharth Verma
>
>
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Siddharth Verma

Hi Jon,
We couldn't setup a spark cluster.
For some use case, a spark cluster was required, but for some reason we
couldn't create spark cluster. Hence, one may use this utility to iterate
through the entire table at very high speed.

Had to find a work around, that would be faster than paging on result set.

Regards

Siddharth Verma
*Software Engineer I - CaMS*
*M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
Download Our App
[image: A]

[image:
A]

[image:
W]


On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad  wrote:

> It almost sounds like you're duplicating all the work of both spark and
> the connector. May I ask why you decided to not use the existing tools?
>
> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi DuyHai,
>> Thanks for your reply.
>> A few more features planned in the next one(if there is one) like,
>> custom policy keeping in mind the replication of token range on specific
>> nodes,
>> fine graining the token range(for more speedup),
>> and a few more.
>>
>> I think, as fine graining a token range,
>> If one token range is split further in say, 2-3 parts, divided among
>> threads, this would exploit the possible parallelism on a large scaled out
>> cluster.
>>
>> And, as you mentioned the JIRA, streaming of request, that would of huge
>> help with further splitting the range.
>>
>> Thanks once again for your valuable comments. :-)
>>
>> Regards,
>> Siddharth Verma
>>
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Jonathan Haddad

It almost sounds like you're duplicating all the work of both spark and the
connector. May I ask why you decided to not use the existing tools?

On Mon, Oct 3, 2016 at 2:21 PM siddharth verma 
wrote:

> Hi DuyHai,
> Thanks for your reply.
> A few more features planned in the next one(if there is one) like,
> custom policy keeping in mind the replication of token range on specific
> nodes,
> fine graining the token range(for more speedup),
> and a few more.
>
> I think, as fine graining a token range,
> If one token range is split further in say, 2-3 parts, divided among
> threads, this would exploit the possible parallelism on a large scaled out
> cluster.
>
> And, as you mentioned the JIRA, streaming of request, that would of huge
> help with further splitting the range.
>
> Thanks once again for your valuable comments. :-)
>
> Regards,
> Siddharth Verma
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Bhuvan Rawal

It will be interesting to have a comparison with spark here for basic use
cases.

>From a naive observation it appears that this could be slower than spark as
a lot of data is streamed over network.

On the other hand in this approach we have seen that Young GC takes nearly
full CPU (possibly because a lot of data I moved on and off heap, which has
been seen as Young Gen keeps getting empty and full sometimes multiple
times a second) and that should be there with spark as well as it will be
calling Cassandra driver, on top of that Spark cluster will be sharing same
compute resources where it does filtering/doing operations on data. If we
have an appropriately sized client machine with enough network bandwidth
this could potentially work faster, ofcourse for basic scanning use cases.

Which of these assumptions seems to be more appropriate?

On Mon, Oct 3, 2016 at 11:40 PM, DuyHai Doan  wrote:

> Hello Siddarth
>
> I just throw an eye over the architecture diagram. The idea of using
> multiple threads, one for each token range is great. It help maxing out
> parallelism.
>
> With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be
> even faster.
>
> On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi,
>> I was working on a utility which can be used for cassandra full table
>> scan, at a tremendously high velocity, cassandra fast full table scan.
>> How fast?
>> The script dumped ~ 229 million rows in 116 seconds, with a cluster of
>> size 6 nodes.
>> Data transfer rates were upto 25MBps was observed on cassandra nodes.
>>
>> For some use case, a spark cluster was required, but for some reason we
>> couldn't create spark cluster. Hence, one may use this utility to iterate
>> through the entire table at very high speed.
>>
>> But now for any full scan, I use it freely for my adhoc java programs to
>> manipulate or aggregate cassandra data.
>>
>> You can customize the options, setting fetch size, consistency level,
>> degree of parallelism(number of threads) according to your need.
>>
>> You can visit https://github.com/siddv29/cfs to go through the code, see
>> the logic behind it, or try it in your program.
>> A sample program is also provided.
>>
>> I coded this utility in java.
>>
>> Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
>> For python you may visit his blog(http://casualreflections.
>> io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and
>> github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47
>> e0981d36d2c0785)
>>
>> Looking forward to your suggestions and comments.
>>
>> P.S. Give it a try. Trust me, the iteration speed is awesome!!
>> It is a bare application, built asap. If you would like to contribute to
>> the java utility, add or build up on it, do reach out
>> sidd.verma29.li...@gmail.com
>>
>> Thanks and Regards,
>> Siddharth Verma
>> (previous email id on this mailing list : verma.siddha...@snapdeal.com)
>>
>
>

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo

You know what don't "go low" and suggest the recent un-subscriber on me.

If your so eager to deal with my pull request please review this one:
I would rather you review this pull request:
https://issues.apache.org/jira/browse/CASSANDRA-10825

On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith 
wrote:

> Nobody is disputing that the docs can and should be improved to avoid this
> misreading.  I've invited Ed to file a JIRA and/or pull request twice now.
>
> You are of course just as welcome to do this.  Perhaps you will actually
> do it, so we can all move on with our lives!
>
>
>
>
> On 3 October 2016 at 17:45, Peter Lin  wrote:
>
>> I've met clients that read the cassandra docs and then said in a big
>> meeting "it's just like relational database, it has tables just like
>> sqlserver/oracle."
>>
>> I'm not putting words in other people's mouth either, but I've heard that
>> said enough times to want to puke. Does the docs claim cassandra is
>> relational ? it absolutely doesn't make that claim, but the docs play
>> loosey goosey with terminology. End result is it confuses new users that
>> aren't experts, or technology managers that try to make a case for
>> cassandra.
>>
>> we can make all the excuses we want, but that doesn't change the fact the
>> docs aren't user friendly. writing great documentation is tough and most
>> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
>> writing user friendly documentation". As many people have pointed out, it's
>> not unique to Cassandra. 80% of the tech docs out there suck, starting with
>> IBM at the top.
>>
>> Saying the docs suck isn't an indictment of anyone, it's just the reality
>> of writing good documentation.
>>
>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
>> wrote:
>>
>>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>>> coming up.
>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
>>> wrote:
>>>
 My original point can be summed up as:

 Do not define cassandra in terms SMILES & METAPHORS. Such words include
 "like" and "close relative".

 For the specifics:

 Any relational db could (and I'm sure one does!) allow for sparse
 fields as well. MySQL can be backed by rocksdb now, does that make it not a
 row store?

 Lets draw some lines, a relational database is clearly defined.

 https://en.wikipedia.org/wiki/Edgar_F._Codd

 Codd's theorem , a
 result proven in his seminal work on the relational model, equates the
 expressive power of relational algebra
  and relational
 calculus  (both of
 which, lacking recursion, are strictly less powerful thanfirst-order
 logic ).[*citation
 needed *]

 As the relational model started to become fashionable in the early
 1980s, Codd fought a sometimes bitter campaign to prevent the term being
 misused by database vendors who had merely added a relational veneer to
 older technology. As part of this campaign, he published his 12 rules
  to define what
 constituted a relational database. This made his position in IBM
 increasingly difficult, so he left to form his own consulting company with
 Chris Date and others.

 Cassandra is not a relational database.

 I am have attempted to illustrate that a "row store" is defined as
 well. I do not believe Cassandra is a "row store".

 "Just because it uses log structured storage, sparse fields, and
 semi-flexible collections doesn't disqualify it from calling it a "row
 store""

 What is the definition of "row store". Is it a logical construct or a
 physical one?

 Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
 and present it as rows and columns. It seems to pass the litmus test being
 presented.

 https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

 On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
 wrote:

 Sorry Ed, but you're really stretching here. A table in Cassandra is
 structured by a schema with the data for each row stored together in each
 data file. Just because it uses log structured storage, sparse fields, and
 semi-flexible collections doesn't disqualify it from calling it a "row
 store"

 Postgres added flexible storage through hstore, I don't hear anyone
 arguing that it needs to be renamed.

 Any relational db could (and I'm sure one does!) allow for sparse
 fields as

Re: Row cache not working

2016-10-03 Thread Abhinav Solan

Hi, can anyone please help me with this

Thanks,
Abhinav

On Fri, Sep 30, 2016 at 6:20 PM Abhinav Solan 
wrote:

> Hi Everyone,
>
> My table looks like this -
> CREATE TABLE test.reads (
> svc_pt_id bigint,
> meas_type_id bigint,
> flags bigint,
> read_time timestamp,
> value double,
> PRIMARY KEY ((svc_pt_id, meas_type_id))
> ) WITH bloom_filter_fp_chance = 0.1
> AND caching = {'keys': 'ALL', 'rows_per_partition': '10'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> Have set up the C* nodes with
> row_cache_size_in_mb: 1024
> row_cache_save_period: 14400
>
> and I am making this query
> select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146;
>
> with tracing on every time it says Row cache miss
>
> activity
>
>| timestamp  | source  | source_elapsed
>
> ---++-+
>
>
> Execute CQL3 query | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>  0
>  Parsing select svc_pt_id, meas_type_id, read_time, value FROM
> cts_svc_pt_latest_int_read where svc_pt_id = -9941235 and meas_type_id =
> 146; [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>111
>
>Preparing statement
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446000 |  192.168.199.75 |
>209
>
> reading data from /192.168.170.186
> [SharedPool-Worker-1] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>370
>
>Sending READ message to /192.168.170.186 [MessagingService-Outgoing-/
> 192.168.170.186] | 2016-09-30 18:15:00.446001 |  192.168.199.75 |
>450
>
> REQUEST_RESPONSE message received from /192.168.170.186
> [MessagingService-Incoming-/192.168.170.186] | 2016-09-30 18:15:00.448000
> |  192.168.199.75 |   2469
>
>  Processing response from /192.168.170.186
> [SharedPool-Worker-8] | 2016-09-30 18:15:00.448000 |  192.168.199.75 |
>   2609
>
>   READ message received from /192.168.199.75 [MessagingService-Incoming-/
> 192.168.199.75] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
> 75
>
> Row cache miss
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>218
>   Fetching data but not
> populating cache as query does not query from the start of the partition
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>246
>
> Executing single-partition query on cts_svc_pt_latest_int_read
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449000 | 192.168.170.186 |
>259
>
>   Acquiring sstable references
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>281
>
>  Merging memtable contents
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>295
>
>Merging data from sstable 8
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>326
>
>Key cache hit for sstable 8
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>351
>
>Merging data from sstable 7
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>439
>
>Key cache hit for sstable 7
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>468
>
>  Read 1 live and 0 tombstone cells
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449001 | 192.168.170.186 |
>615
>
>  Enqueuing response to /192.168.199.75
> [SharedPool-Worker-2] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
>766
>Sending
> REQUEST_RESPONSE message to /192.168.199.75 [MessagingService-Outgoing-/
> 192.168.199.75] | 2016-09-30 18:15:00.449002 | 192.168.170.186 |
>897
>
>

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma

Hi DuyHai,
Thanks for your reply.
A few more features planned in the next one(if there is one) like,
custom policy keeping in mind the replication of token range on specific
nodes,
fine graining the token range(for more speedup),
and a few more.

I think, as fine graining a token range,
If one token range is split further in say, 2-3 parts, divided among
threads, this would exploit the possible parallelism on a large scaled out
cluster.

And, as you mentioned the JIRA, streaming of request, that would of huge
help with further splitting the range.

Thanks once again for your valuable comments. :-)

Regards,
Siddharth Verma

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread DuyHai Doan

Hello Siddarth

I just throw an eye over the architecture diagram. The idea of using
multiple threads, one for each token range is great. It help maxing out
parallelism.

With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even
faster.

On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma  wrote:

> Hi,
> I was working on a utility which can be used for cassandra full table
> scan, at a tremendously high velocity, cassandra fast full table scan.
> How fast?
> The script dumped ~ 229 million rows in 116 seconds, with a cluster of
> size 6 nodes.
> Data transfer rates were upto 25MBps was observed on cassandra nodes.
>
> For some use case, a spark cluster was required, but for some reason we
> couldn't create spark cluster. Hence, one may use this utility to iterate
> through the entire table at very high speed.
>
> But now for any full scan, I use it freely for my adhoc java programs to
> manipulate or aggregate cassandra data.
>
> You can customize the options, setting fetch size, consistency level,
> degree of parallelism(number of threads) according to your need.
>
> You can visit https://github.com/siddv29/cfs to go through the code, see
> the logic behind it, or try it in your program.
> A sample program is also provided.
>
> I coded this utility in java.
>
> Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
> For python you may visit his blog(http://casualreflections.
> io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and
> github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c07
> 85)
>
> Looking forward to your suggestions and comments.
>
> P.S. Give it a try. Trust me, the iteration speed is awesome!!
> It is a bare application, built asap. If you would like to contribute to
> the java utility, add or build up on it, do reach out
> sidd.verma29.li...@gmail.com
>
> Thanks and Regards,
> Siddharth Verma
> (previous email id on this mailing list : verma.siddha...@snapdeal.com)
>

An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma

Hi,
I was working on a utility which can be used for cassandra full table scan,
at a tremendously high velocity, cassandra fast full table scan.
How fast?
The script dumped ~ 229 million rows in 116 seconds, with a cluster of size
6 nodes.
Data transfer rates were upto 25MBps was observed on cassandra nodes.

For some use case, a spark cluster was required, but for some reason we
couldn't create spark cluster. Hence, one may use this utility to iterate
through the entire table at very high speed.

But now for any full scan, I use it freely for my adhoc java programs to
manipulate or aggregate cassandra data.

You can customize the options, setting fetch size, consistency level,
degree of parallelism(number of threads) according to your need.

You can visit https://github.com/siddv29/cfs to go through the code, see
the logic behind it, or try it in your program.
A sample program is also provided.

I coded this utility in java.

Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
For python you may visit his blog(
http://casualreflections.io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python)
and github(
https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c0785)

Looking forward to your suggestions and comments.

P.S. Give it a try. Trust me, the iteration speed is awesome!!
It is a bare application, built asap. If you would like to contribute to
the java utility, add or build up on it, do reach out
sidd.verma29.li...@gmail.com

Thanks and Regards,
Siddharth Verma
(previous email id on this mailing list : verma.siddha...@snapdeal.com)

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Dorian Hoxha

@INDRANIL
Please go find your own thread and don't hijack mine.

On Mon, Oct 3, 2016 at 6:19 PM, INDRANIL BASU  wrote:

> Hello All,
>
> I am getting the below error repeatedly in the system log of C* 2.1.0
>
> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
> test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold). 5000
> columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
> localDeletion=2147483647}
>
> After that NullPointer Exception and finally OOM
>
> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 6287,1,main]
> java.lang.NullPointerException: null
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:475)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:463)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.cache.AutoSavingCache$Writer.
> saveCache(AutoSavingCache.java:225) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.db.compaction.CompactionManager$
> 11.run(CompactionManager.java:1061) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.FutureTask.run(Unknown Source)
> ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 9712,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10070,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10413,1,main]
> java.lang.NullPointerException: null
> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
> CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter:
> 2396,5,main]
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) ~[na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>
> -- IB
>
>
>

Tombstoned error and then OOM

2016-10-03 Thread INDRANIL BASU

Hello All,



I am getting the below error repeatedly in the system log of C* 2.1.0

WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 
- Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_col1_idx 
(see tombstone_warn_threshold). 5000 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
After that NullPointer Exception and finally OOM
ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:6287,1,main]
java.lang.NullPointerException: null
    at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:475)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:463)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:9712,1,main]
java.lang.NullPointerException: null
ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:10070,1,main]
java.lang.NullPointerException: null
ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:10413,1,main]
java.lang.NullPointerException: null
ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425 
CassandraDaemon.java:166 - Exception in thread 
Thread[MemtableFlushWriter:2396,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
    at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
-- IB

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith

Nobody is disputing that the docs can and should be improved to avoid this
misreading.  I've invited Ed to file a JIRA and/or pull request twice now.

You are of course just as welcome to do this.  Perhaps you will actually do
it, so we can all move on with our lives!




On 3 October 2016 at 17:45, Peter Lin  wrote:

> I've met clients that read the cassandra docs and then said in a big
> meeting "it's just like relational database, it has tables just like
> sqlserver/oracle."
>
> I'm not putting words in other people's mouth either, but I've heard that
> said enough times to want to puke. Does the docs claim cassandra is
> relational ? it absolutely doesn't make that claim, but the docs play
> loosey goosey with terminology. End result is it confuses new users that
> aren't experts, or technology managers that try to make a case for
> cassandra.
>
> we can make all the excuses we want, but that doesn't change the fact the
> docs aren't user friendly. writing great documentation is tough and most
> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
> writing user friendly documentation". As many people have pointed out, it's
> not unique to Cassandra. 80% of the tech docs out there suck, starting with
> IBM at the top.
>
> Saying the docs suck isn't an indictment of anyone, it's just the reality
> of writing good documentation.
>
> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
> wrote:
>
>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>> coming up.
>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
>> wrote:
>>
>>> My original point can be summed up as:
>>>
>>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>>> "like" and "close relative".
>>>
>>> For the specifics:
>>>
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>>
>>> Lets draw some lines, a relational database is clearly defined.
>>>
>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>
>>> Codd's theorem , a
>>> result proven in his seminal work on the relational model, equates the
>>> expressive power of relational algebra
>>>  and relational
>>> calculus  (both of
>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>> logic ).[*citation
>>> needed *]
>>>
>>> As the relational model started to become fashionable in the early
>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>> misused by database vendors who had merely added a relational veneer to
>>> older technology. As part of this campaign, he published his 12 rules
>>>  to define what
>>> constituted a relational database. This made his position in IBM
>>> increasingly difficult, so he left to form his own consulting company with
>>> Chris Date and others.
>>>
>>> Cassandra is not a relational database.
>>>
>>> I am have attempted to illustrate that a "row store" is defined as well.
>>> I do not believe Cassandra is a "row store".
>>>
>>>
>>>
>>> "Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store""
>>>
>>> What is the definition of "row store". Is it a logical construct or a
>>> physical one?
>>>
>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>> and present it as rows and columns. It seems to pass the litmus test being
>>> presented.
>>>
>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>>> wrote:
>>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>>> wrote:
>>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry

"X-store" refers to how data is stored, in almost every case it refers to
what logical constructs are grouped together physically on disk.  It has
nothing to do with whether a database is relational or not.

Cassandra does, in fact meet the definition of row-store, however, I would
like to re-iterate that it goes beyond that and stores all rows for a
single partition together on disk as well.  Therefore row-store does not do
it justice, which is why I like the term "Partitioned row-store"

On Mon, Oct 3, 2016 at 12:37 PM, Benedict Elliott Smith  wrote:

> ... and my response can be summed up as "you are not parsing English
> correctly."  The word "like" does not mean what you think it means in this
> context.  It does not mean "close relative."  It is constrained to the
> similarities expressed, and no others.  You don't seem to be reading any of
> my responses about this, though, so I'm not sure parsing is your issue.
>
> Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
> persisting nulls in exactly the same way C* does - encoding their absence
> in the row header.
>
> I empathise with the recent unsubscriber.
>
>
>
> On 3 October 2016 at 15:53, Edward Capriolo  wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem , a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>>  and relational
>> calculus  (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic ).[*citation
>> needed *]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>>  to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>> wrote:
>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>>> wrote:
>>>
 Also every piece of techincal information that describes a rowstore

 http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
 https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems

 Does it like this:

 001:10,Smith,Joe,4;
 002:12,Jones,Mary,5;
 003:11,Johnson,Cathy,44000;
 004:22,Jones,Bob,55000;



 The never depict a scenario where a the data looks like this on disk:

 001:10,Smith

 001:10,4;

 Which is much closer to how Cassandra *stores* it's data.



 On Fri, Sep 30, 2016 at

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin

I've met clients that read the cassandra docs and then said in a big
meeting "it's just like relational database, it has tables just like
sqlserver/oracle."

I'm not putting words in other people's mouth either, but I've heard that
said enough times to want to puke. Does the docs claim cassandra is
relational ? it absolutely doesn't make that claim, but the docs play
loosey goosey with terminology. End result is it confuses new users that
aren't experts, or technology managers that try to make a case for
cassandra.

we can make all the excuses we want, but that doesn't change the fact the
docs aren't user friendly. writing great documentation is tough and most
developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
writing user friendly documentation". As many people have pointed out, it's
not unique to Cassandra. 80% of the tech docs out there suck, starting with
IBM at the top.

Saying the docs suck isn't an indictment of anyone, it's just the reality
of writing good documentation.

On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad  wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem , a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>>  and relational
>> calculus  (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic ).[*citation
>> needed *]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>>  to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>>
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>> wrote:
>>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>> wrote:
>>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith

... and my response can be summed up as "you are not parsing English
correctly."  The word "like" does not mean what you think it means in this
context.  It does not mean "close relative."  It is constrained to the
similarities expressed, and no others.  You don't seem to be reading any of
my responses about this, though, so I'm not sure parsing is your issue.

Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
persisting nulls in exactly the same way C* does - encoding their absence
in the row header.

I empathise with the recent unsubscriber.



On 3 October 2016 at 15:53, Edward Capriolo  wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>> wrote:
>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>
>>> Does it like this:
>>>
>>> 001:10,Smith,Joe,4;
>>> 002:12,Jones,Mary,5;
>>> 003:11,Johnson,Cathy,44000;
>>> 004:22,Jones,Bob,55000;
>>>
>>>
>>>
>>> The never depict a scenario where a the data looks like this on disk:
>>>
>>> 001:10,Smith
>>>
>>> 001:10,4;
>>>
>>> Which is much closer to how Cassandra *stores* it's data.
>>>
>>>
>>>
>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
>>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>>> As it happens, our README thinks the same, which is fantastic.
>>>
>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>> before disappointment.  HBase literally calls itself a "
>>> *column-oriented* store" - which is so totally wrong it's
>>> simultaneously hilarious and tragic.
>>>
>>> I guess we can't blame the wider internet for misunderstanding/misnaming
>>> us poor "wide column stores" if even one of the major examples doesn't know
>>> what it, itself, is!
>>>
>>>
>>>
>>>
>>> On 30 September 2016 at 21:47, Jonathan Haddad

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad

It's a row store because its schemed (vs ad hoc documents), and data (rows)
are stored together. What would you call the things you iterate over when
you query a partition? Rows. That makes it a thing that stores "rows" of
data, row store isn't some crazy stretch.
On Mon, Oct 3, 2016 at 12:33 PM Jonathan Haddad  wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
> wrote:
>
> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad

Nobody is claiming Cassandra is a relational I'm not sure why that keeps
coming up.
On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread INDRANIL BASU

Hello All,

I am getting the below error repeatedly in the system log of C* 2.1.0

WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 
- Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_col1_idx 
(see tombstone_warn_threshold). 5000 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
After that NullPointer Exception and finally OOM
ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:6287,1,main]
java.lang.NullPointerException: null
    at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:475)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:463)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at 
org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
 ~[apache-cassandra-2.1.0.jar:2.1.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:9712,1,main]
java.lang.NullPointerException: null
ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:10070,1,main]
java.lang.NullPointerException: null
ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265 
CassandraDaemon.java:166 - Exception in thread 
Thread[CompactionExecutor:10413,1,main]
java.lang.NullPointerException: null
ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425 
CassandraDaemon.java:166 - Exception in thread 
Thread[MemtableFlushWriter:2396,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
    at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
-- IB

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Dorian Hoxha

Thanks for the explanation Eric.

I would think it as something like:
The keyspace will be on dc1 + dc2, with the option that no long-term-data
is in dc1. So you write to dc1 (to the right nodes), they write to
commit-log/memtable and when they push for inter-dc-replication dc1 then
deletes local data. While dc2 doesn't push data to dc1 for replication.

On Mon, Oct 3, 2016 at 5:25 PM, Eric Stevens  wrote:

> It sounds like you're trying to avoid the latency of waiting for a write
> confirmation to a remote data center?
>
> App ==> DC1 ==high-latency==> DC2
>
> If you need the write to be confirmed before you consider the write
> successful in your application (definitely recommended unless you're ok
> with losing data and the app having no idea), you're not going to solve the
> fundamental physics problem of having to wait for a round-trip between
> _something_ and DC2.  DC1 can't acknowledge the write until it's in
> memtables and commitlog of a node that owns that data, so under the hoods
> it's doing basically the same thing your app would have to do.  In fact,
> putting DC1 in the middle just introduces a (possibly trivial but
> definitely not zero) amount of additional latency over:
>
> App ==high-latency==> DC2
>
> The only exception would be if you had an expectation that latency between
> DC1 and DC2 would be lower than latency between App and DC2, which I admit
> is not impossible.
>
> On Fri, Sep 30, 2016 at 1:49 PM Dorian Hoxha 
> wrote:
>
>> Thanks Edward. Looks like it's not possible what I really wanted (to use
>> some kind of a quorum write ex).
>>
>> Note that the queue is ordered, but I need just so they eventually
>> happen, but with more consistency than ANY (2 nodes or more).
>>
>> On Fri, Sep 30, 2016 at 12:25 AM, Edward Capriolo 
>> wrote:
>>
>>> You can do something like this, though your use of terminology like
>>> "queue" really do not apply.
>>>
>>> You can setup your keyspace with replication in only one data center.
>>>
>>> CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 
>>> 'NetworkTopologyStrategy', 'dc2' : 3 };
>>>
>>> This will make the NTSkeyspace like only in one data center. You can
>>> always write to any Cassandra node, since they will transparently proxy the
>>> writes to the proper place. You can configure your client to ONLY bind to
>>> specific hosts or data centers/hosts DC1.
>>>
>>> You can use a write consistency level like ANY. IF you use a consistency
>>> level like ONE. It will cause the the write to block anyway waiting for
>>> completion on the other datacenter.
>>>
>>> Since you mentioned the words "like a queue" I would suggest an
>>> alternative is to writing the data do a distributed commit log like kafka.
>>> At that point you can decouple the write systems either through producer
>>> consumer or through a tool like Kafka's mirror maker.
>>>
>>>
>>> On Thu, Sep 29, 2016 at 5:24 PM, Dorian Hoxha 
>>> wrote:
>>>
 I have dc1 and dc2.
 I want to keep a keyspace only on dc2.
 But I only have my app on dc1.
 And I want to write to dc1 (lower latency) which will not keep data
 locally but just push it to dc2.
 While reading will only work for dc2.
 Since my app is mostly write, my app ~will be faster while not having
 to deploy to the app to dc2 or write directly to dc2 with higher latency.

 dc1 would act like a queue or something and just push data + delete
 locally.

 Does this make sense ?

 Thank You

>>>
>>>
>>

Re: Way to write to dc1 but keep data only in dc2

2016-10-03 Thread Eric Stevens

It sounds like you're trying to avoid the latency of waiting for a write
confirmation to a remote data center?

App ==> DC1 ==high-latency==> DC2

If you need the write to be confirmed before you consider the write
successful in your application (definitely recommended unless you're ok
with losing data and the app having no idea), you're not going to solve the
fundamental physics problem of having to wait for a round-trip between
_something_ and DC2.  DC1 can't acknowledge the write until it's in
memtables and commitlog of a node that owns that data, so under the hoods
it's doing basically the same thing your app would have to do.  In fact,
putting DC1 in the middle just introduces a (possibly trivial but
definitely not zero) amount of additional latency over:

App ==high-latency==> DC2

The only exception would be if you had an expectation that latency between
DC1 and DC2 would be lower than latency between App and DC2, which I admit
is not impossible.

On Fri, Sep 30, 2016 at 1:49 PM Dorian Hoxha  wrote:

> Thanks Edward. Looks like it's not possible what I really wanted (to use
> some kind of a quorum write ex).
>
> Note that the queue is ordered, but I need just so they eventually happen,
> but with more consistency than ANY (2 nodes or more).
>
> On Fri, Sep 30, 2016 at 12:25 AM, Edward Capriolo 
> wrote:
>
>> You can do something like this, though your use of terminology like
>> "queue" really do not apply.
>>
>> You can setup your keyspace with replication in only one data center.
>>
>> CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 
>> 'NetworkTopologyStrategy', 'dc2' : 3 };
>>
>> This will make the NTSkeyspace like only in one data center. You can
>> always write to any Cassandra node, since they will transparently proxy the
>> writes to the proper place. You can configure your client to ONLY bind to
>> specific hosts or data centers/hosts DC1.
>>
>> You can use a write consistency level like ANY. IF you use a consistency
>> level like ONE. It will cause the the write to block anyway waiting for
>> completion on the other datacenter.
>>
>> Since you mentioned the words "like a queue" I would suggest an
>> alternative is to writing the data do a distributed commit log like kafka.
>> At that point you can decouple the write systems either through producer
>> consumer or through a tool like Kafka's mirror maker.
>>
>>
>> On Thu, Sep 29, 2016 at 5:24 PM, Dorian Hoxha 
>> wrote:
>>
>>> I have dc1 and dc2.
>>> I want to keep a keyspace only on dc2.
>>> But I only have my app on dc1.
>>> And I want to write to dc1 (lower latency) which will not keep data
>>> locally but just push it to dc2.
>>> While reading will only work for dc2.
>>> Since my app is mostly write, my app ~will be faster while not having to
>>> deploy to the app to dc2 or write directly to dc2 with higher latency.
>>>
>>> dc1 would act like a queue or something and just push data + delete
>>> locally.
>>>
>>> Does this make sense ?
>>>
>>> Thank You
>>>
>>
>>
>

Re: Repairing without -pr shows unexpected out-of-sync ranges

2016-10-03 Thread Stefano Ortolani

I was wondering: is (2) a direct consequence of a repair on the full
token range (and thus anti-compaction ran only on a subset of the RF
nodes)?. If I understand correctly, a repair with -pr should fix this,
at the cost of all nodes performing the anticompaction phase?

Cheers,
Stefano

On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani  wrote:
> Didn't know about (2), and I actually have a time drift between the nodes.
> Thanks a lot Paulo!
>
> Regards,
> Stefano
>
> On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta 
> wrote:
>>
>> There are a couple of things that could be happening here:
>> - There will be time differences between when nodes participating repair
>> flush, so in write-heavy tables there will always be minor differences
>> during validation, and those could be accentuated by low resolution merkle
>> trees, which will affect mostly larger tables.
>> - SSTables compacted during incremental repair will not be marked as
>> repaired, so nodes with different compaction cadences will have different
>> data in their unrepaired set, what will cause mismatches in the subsequent
>> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation.
>>
>> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani :
>>>
>>> Hi,
>>>
>>> I am seeing something weird while running repairs.
>>> I am testing 3.0.9 so I am running the repairs manually, node after node,
>>> on a cluster with RF=3. I am using a standard repair command (incremental,
>>> parallel, full range), and I just noticed that the third node detected some
>>> ranges out of sync with one of the nodes that just finished repairing.
>>>
>>> Since there was no dropped mutation, that sounds weird to me considering
>>> that the repairs are supposed to operate on the whole range.
>>>
>>> Any idea why?
>>> Maybe I am missing something?
>>>
>>> Cheers,
>>> Stefano
>>>
>>
>

Unsubscribe

2016-10-03 Thread Christof Bornhoevd

Unsubscribe

On Monday, October 3, 2016, Benedict Elliott Smith 
wrote:

> While that sentence leaves a lot to be desired (for me because it confers
> a different meaning on row store), it doesn't say "Cassandra is like a
> RDBMS" - it says "like an RDBMS, it organises data by rows and columns" -
> i.e., in this regard only it is like an RDBMS, not more generally.
>
> I believe it was meant to help people, especially those afraid of the
> NoSQL thrift world, understand that it still uses the basic concept of a
> rows and columns they are used to.  I agree it could be improved to
> minimise the chance of misreading it, and I'm certain contributions would
> be welcome here.
>
> I don't personally want to get bogged down in analysing every piece of
> text anyone has ever written, so I'll bow out of further discussion on
> this.  These phrases may all be suboptimal, but they are certainly
> defensible.  Column store is not, that's all I wanted to contribute here.
>
>
>
>
>
> On 1 October 2016 at 19:35, Peter Lin  > wrote:
>
>> I'll second Ed's comment.
>>
>> The documentation should be more careful when using phrases "like
>> relational databases". When we look at the history of relational databases,
>> people expect certain things like ACID transactions, primary/foriegn key
>> constraints, query planners, joins and relational algebra. Clearly
>> Cassandra's storage engine does not follow most of those principals for a
>> good reason.
>>
>> The term row oriented storage would be more descriptive and appropriate.
>> It avoids conflating Cassandra storage engine with "traditional" relational
>> storage engines. Those of us that have spent over a decade using IBM DB2,
>> Oracle, Sql Server and Sybase tend to think of relational databases in a
>> certain way. If we go back to 1998, most RDBMS storage engine had a max row
>> size limit. Databases like Sybase before version 9 preferred RAW disk for
>> optimal performance. I can go on and on, but there's no point really.
>>
>> Cassandra's storage engine is "row oriented", but it's not relational in
>> RDBMS sense. We do everyone a huge disservice by using confusing
>> terminology and then making fun of those who get confused. No one wins when
>> that happens. At the end of the day, what differentiates cassandra's
>> storage engine is it support static and dynamic columns, which traditional
>> RDBMS don't support today. Calling Cassandra storage "distributed tables"
>> doesn't really help in my bias opinion.
>>
>> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
>> distributed tables" they might answer "so what, sql server and oracle can
>> do that too." The difference is with RDBMS the partitioning is optional and
>> requires more work to configure. Whereas with Cassandra you can have
>> everything in 1 node, which means there is only 1 partition and no
>> different to 1 instance of sql server. Where you win is when you need to
>> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
>> Oracle you have to do a little bit more work. I've lost count of how many
>> times I've to explained noSql databases to RDBMS admins and had to explain
>> the official docs are stupid.
>>
>>
>>
>> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo > > wrote:
>>
>>> https://github.com/apache/cassandra
>>>
>>> Row store  means that like
>>> relational databases, Cassandra organizes data by rows and columns. The
>>> Cassandra Query Language (CQL) is a close relative of SQL.
>>>
>>> I generally do not know what to say about these high level
>>> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
>>> or do they mean IP routers with layer 4 packet inspections and layer 3
>>> Access Control Lists?
>>>
>>> We say (and I catch myself doing it all the time) "like relational
>>> databases" often as if all relational databases work alike. A columnar
>>> store like HP Vertica is a relational database.MySql has different storage
>>> engines does MyIsam work like InnoDB?
>>>
>>> Google docs organizes data by rows and columns as well. You can wrap any
>>> storage system into an API that makes them look like rows and columns.
>>> Microsoft LINQ can enumerate your network cars and query them
>>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really
>>> does not make your network cards a "row store"
>>>
>>> "Theoretically a row can have 2 billion columns, but in practice it
>>> shouldn't have more than 100 million columns."
>>> In practice (In my experience) the number is much lower than 100
>>> million, and if the data actually is deleted and readded frequently the
>>> number of live columns(rows, whatever) you can use happily is even lower
>>>
>>>
>>> I believe on twitter (I am unable to find the tweet) someone

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo

My original point can be summed up as:

Do not define cassandra in terms SMILES & METAPHORS. Such words include
"like" and "close relative".

For the specifics:

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

Lets draw some lines, a relational database is clearly defined.

https://en.wikipedia.org/wiki/Edgar_F._Codd

Codd's theorem , a result
proven in his seminal work on the relational model, equates the expressive
power of relational algebra
 and relational calculus
 (both of which, lacking
recursion, are strictly less powerful thanfirst-order logic
).[*citation needed
*]

As the relational model started to become fashionable in the early 1980s,
Codd fought a sometimes bitter campaign to prevent the term being misused
by database vendors who had merely added a relational veneer to older
technology. As part of this campaign, he published his 12 rules
 to define what
constituted a relational database. This made his position in IBM
increasingly difficult, so he left to form his own consulting company with
Chris Date and others.

Cassandra is not a relational database.

I am have attempted to illustrate that a "row store" is defined as well. I
do not believe Cassandra is a "row store".

"Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store""

What is the definition of "row store". Is it a logical construct or a
physical one?

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
present it as rows and columns. It seems to pass the litmus test being
presented.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage





On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry

A couple things I would like to note:

1. Cassandra does not determine how data is stored on disk, the compaction
strategy does.  One could, in theory, (and I believe some are trying) could
create a column-store compaction strategy.  There is a large effort in the
database community overall to separate the query execution from the storage
engine, it is becoming increasingly more incorrect to say a database is an
"X store" database.

2. "X-store" is not used, and never has been, to describe how data is
represented or queried.  When most database storage engines describe their
storage as "X-store" they are referring to contiguous bytes on disk.  In
traditional rows-store engines, on a single node, the definition is as
follows: "All data for a row is stored as a single block of contiguous
bytes on disk".  Traditional column-stores are also defined as "All data
for a column is stored contiguously on disk".  Old-style Cassandra was a
key-value column-family store in that "all data for a family of columns
belonging to a given key were stored contiguously on disk"

So when talking about Cassandra and all currently merged compaction
strategies, yes, it fits the definition of a row-store in that "All data
for a row is stored as contiguous bytes on disk", however, it goes further
because "All data for all rows in a given partition are stored as
contiguous bytes on disk".  So at the highest level one could say it is a
"Partition-store" but that is pretty vague.   I think it is deserving of a
different naming definition which is why I like the term
"Partitioned-row-store"  which gives insight into the fact that it is rows
being stored on disk, in a partitioned format.

PS.
To address the pedants, yes, by these definitions you would have to assume
that a partition resides in a single SSTable. While most compaction
strategies try hard to achieve this it currently only exists in one that I
know. You could call it a
"Partitioned-row-depenendent-upon-compaction-strategy-store" but that is
just terrible.

On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin

Whether a storage engine requires schema isn't really critical for row
oriented storage. How about CSV that doesn't have a header row? CSV is
probably the most commonly used row oriented storage and tons of businesses
still use it for B2B transactions.

As you pointed out, some traditional RDBMS have been adding
"non-traditional" storage options, which is good for everyone. What RDBMS
still don't support is dynamic columns and I really doubt the SQL working
group would add it in the near future. Though SqlServer and Oracle both
support XML datatype, which one could argue "kind of" achieves the similar
flexibility to dynamic columns.

Then there's RDBMS that are adding native support for JSON, which muddies
the water even more. As an english major, being precise and concise with
language is important even if 80% of the people in the IT field abuse it.
I've been on countless sales calls with management. More often than not,
they read the documentation written by developers and feel like they're
reading gibberish. It's best to avoid loaded terms like "row store". Just
because some people like it, doesn't mean it achieves the goal of clear
communication.


On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad

Sorry Ed, but you're really stretching here. A table in Cassandra is
structured by a schema with the data for each row stored together in each
data file. Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store"

Postgres added flexible storage through hstore, I don't hear anyone arguing
that it needs to be renamed.

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

You're arguing that everything is wrong but you're not proposing an
alternative, which is not productive.
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
wrote:

> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares 
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column,

Re: Error while read after upgrade from 2.2.7 to 3.0.8

2016-10-03 Thread Oleg Krayushkin

Hi, Adil, thanks for response.

Both before and after C* upgrade we're using java driver 3.0.3, which seems
to be compatible to 2.2.7 and 3.0.8.

Also I forgot to mention that such errors occur even when there are no
clients connected to cluster.


2016-10-02 7:57 GMT+00:00 Adil :

> Hi,
> That means that some clients closes the connection, have you upgraded all
> clients?
>
> Il 30/set/2016 14:25, "Oleg Krayushkin"  ha scritto:
>
>> Hi,
>>
>> Since the upgrade from Cassandra version 2.2.7 to 3.0.8 We're getting
>> following error almost every several minutes on every node. For node at
>> 173.170.147.120 error in system.log would be:
>>
>> INFO  [SharedPool-Worker-4] 2016-09-30 10:26:39,068 Message.java:605
>>- Unexpected exception during request; channel = [id: 0xfd64cd67, 
>> /173.170.147.120:50660 :> /18.4.63.191:9042]
>> java.io.IOException: Error while read(...): Connection reset by peer
>> at io.netty.channel.epoll.Native.readAddress(Native Method) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at 
>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
>>
>> As far as I see, in all such errors there are always [id: <...>,
>> /: :> /:> sport_port>.  Also broadcast_address and listen_address are always
>> belong to the current node adresses.
>>
>> What are possible reasons of such errors and how can I fix it? Any
>> thoughts would be appreciated.
>>
>

Cassandra 3 node cluster with intermittent network issues on one node

2016-10-03 Thread Girish Kamarthi

Hi All,

I want to test out a scenario where there is intermittent network issues on
one of the node.

I've got Cassandra 3.7 cluster of 3 nodes with the keyspace replication
factor of 3.

All the 3 nodes(node A, node B, node C) are started and are in sync. When
one of the cassandra node went down (node A), I restarted cassandra, the
node A gets in sync with the other nodes B & C.

Now my question is when one of the node has issues like intermittent
network issues (cassandra is still up and running). Say node A is having
network issues, the nodetool status on the other 2 nodes b & C shows that
the node A is down.

*Debug.log of Node B & C:*

DEBUG [GossipTasks:1] 2016-10-03 11:46:18,922 Gossiper.java:337 -
Convicting /10.1.1.4 with status NORMAL - alive false

When the network is back on the node A the nodetool status shows that the
other nodes are down.

*Debug.log of Node A:*

DEBUG [GossipTasks:1] 2016-10-03 11:47:23,613 Gossiper.java:337 -
Convicting /10.1.1.5 with status NORMAL - alive false

DEBUG [GossipTasks:1] 2016-10-03 11:47:23,614 Gossiper.java:337 -
Convicting /10.1.1.6 with status NORMAL - alive false


Below are the configuration changes I made in the cassandra.yaml files.

Node 01

cluster_name: 'Test Cluster'

num_tokens: 256

seed_provider: - class_name:
org.apache.cassandra.locator.SimpleSeedProvider

parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"

listen_address: 10.1.1.4

broadcast_address: 10.1.1.4

rpc_address: 0.0.0.0

broadcast_rpc_address: 10.1.1.4


Node02

cluster_name: 'Test Cluster'

num_tokens: 256

seed_provider: - class_name:
org.apache.cassandra.locator.SimpleSeedProvider

parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"

listen_address: 10.1.1.5

broadcast_address: 10.1.1.5

rpc_address: 0.0.0.0

broadcast_rpc_address: 10.1.1.5


Node03

cluster_name: 'Test Cluster'

num_tokens: 256

seed_provider: - class_name:
org.apache.cassandra.locator.SimpleSeedProvider

parameters: - seeds: "10.1.1.4,10.1.1.5,10.1.1.6"

listen_address: 10.1.1.6

broadcast_address: 10.1.1.6

rpc_address: 0.0.0.0

broadcast_rpc_address: 10.1.1.6


Nodetool status on node A when the network is up shows that the other nodes
are down (DN).

Nodetool status on the other nodes B & C shows that the node 1 is down (DN)

How does the handshaking works in this scenario?

Why the node A is not in sync with the other nodes when the network is up?

Please give me some inputs on resolving this issue.

Thanks & Regards,
Girish Kumar Kamarthi
+91-9986427891

44 matches

Mail list logo