How to autogenerate timestamp with Spark Cassandra connector
Hi, I am trying to insert into the following column family using Spark Cassandra connector. CREATE TABLE myks.mycf ( id bigint, msg text, type text, ts, timestamp, primary key (id, msg) ) Is there a way to to have the ts field automatically generate : // dataRdd is of Type RDD[(Int,String,String)]. Would like the ts field automatically filled in with value from now(). dataRdd.saveToCassandra(keySpace, cf, SomeColumns(id, msg, type)) Do I need to implement my own RowWriterFactor ? Thanks in advance for any assistance ! Shing
Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive
I have run a sysbench file io test on my home PC and office PC. The result is given below. The result shows my office PC (with a SSD) is about 3 times more performant than my home PC (with a sata hard disk). Home PC : gauss:~ sysbench --test=fileio --file-total-size=50G prepare sysbench 0.5: multi-threaded system evaluation benchmark 128 files, 409600Kb each, 51200Mb total Creating files for the test... Extra file open flags: 0 Creating file test_file.0 Creating file test_file.1 Creating file test_file.2 . Creating file test_file.125 Creating file test_file.126 Creating file test_file.127 53687091200 bytes written in 626.30 seconds (81.75 MB/sec). matmsh@gauss:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Extra file open flags: 0 128 files, 400Mb each 50Gb total file size Block size 16Kb Number of IO requests: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Threads started! Operations performed: 14521 reads, 9680 writes, 30976 Other = 55177 Total Read 226.89Mb Written 151.25Mb Total transferred 378.14Mb (1.2605Mb/sec) 80.67 Requests/sec executed General statistics: total time: 300.0030s total number of events: 24201 total time taken by event execution: 186.0749s response time: min: 0.00ms avg: 7.69ms max:132.43ms approx. 95 percentile: 19.57ms Threads fairness: events (avg/stddev): 24201./0.00 execution time (avg/stddev): 186.0749/0.00 gauss:~ === Office PC : shing@cauchy:~ sysbench --test=fileio --file-total-size=50G prepare sysbench 0.5: multi-threaded system evaluation benchmark 128 files, 409600Kb each, 51200Mb total Creating files for the test... Extra file open flags: 0 Creating file test_file.0 Creating file test_file.1 Creating file test_file.2 Creating file test_file.3 ...Creating file test_file.122 Creating file test_file.123 Creating file test_file.124 Creating file test_file.125 Creating file test_file.126 Creating file test_file.127 53687091200 bytes written in 175.55 seconds (291.66 MB/sec). cauchy:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Random number generator seed is 0 and will be ignored Extra file open flags: 0 128 files, 400Mb each 50Gb total file size Block size 16Kb Number of IO requests: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Threads started! Operations performed: 43020 reads, 28680 writes, 91723 Other = 163423 Total Read 672.19Mb Written 448.12Mb Total transferred 1.0941Gb (3.7344Mb/sec) 239.00 Requests/sec executed General statistics: total time: 300.0007s total number of events: 71700 total time taken by event execution: 7.5550s response time: min: 0.00ms avg: 0.11ms max: 12.89ms approx. 95 percentile: 0.22ms Threads fairness: events (avg/stddev): 71700./0.00 execution time (avg/stddev): 7.5550/0.00 === Shing On Saturday, 27 September 2014, 10:24, Shing Hing Man mat...@yahoo.com wrote: Hi Kevin, Thanks for the reply ! I do not know the exact brand of SSD in my office PC. But the SSD is only 1 year old, and it is far from full. On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then run cassandra -f with the default config, then run cassandra-stress Both PCs have Oracle Java 1.7.0_40. I have noticed there are some parameters for SSD in cassandra.yaml, which I have adjusted, but with no improvement. It puzzles me Cassandra on my office PC, with far better hardware, could be 100% slower than my home PC. Shing On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote: What SSD was it? There are a lot of variability in terms of SSD performance. 1. Is it a new vs old SSD? Old SSDs can become slower if they’re really worn out 2. was the office SSD near capacity holding other data? 3
Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive
Hi Kevin, Thanks for the reply ! I do not know the exact brand of SSD in my office PC. But the SSD is only 1 year old, and it is far from full. On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then run cassandra -f with the default config, then run cassandra-stress Both PCs have Oracle Java 1.7.0_40. I have noticed there are some parameters for SSD in cassandra.yaml, which I have adjusted, but with no improvement. It puzzles me Cassandra on my office PC, with far better hardware, could be 100% slower than my home PC. Shing On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote: What SSD was it? There are a lot of variability in terms of SSD performance. 1. Is it a new vs old SSD? Old SSDs can become slower if they’re really worn out 2. was the office SSD near capacity holding other data? 3. what models were they? SSD != SSD… there is a massive amount of performance variability out there. … also … more data is needed. JDK versions the same? cassandra versions the same? what about the config? On Fri, Sep 26, 2014 at 2:39 PM, Shing Hing Man mat...@yahoo.com wrote: Hi, I have run cassandra-stress write and cassandra-stress read on my office PC and on my home PC. Office PC : Intel Core i7-4479, 8 virtual core, 16G RAM, 500G SSD Home PC : Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk. From the cassandra-stress result (please see below), it seems Cassandra is more than 100% performant on my home PC than the office PC. I am expecting the other way around, as my office PC has much better hardware. Office : Intel Core i7-4479, 9 virtual cores, 16G RAM, 500G SSDcauchy:~/installed/cassandra/tools/bin ./cassandra-stress write Running with 8 threadCount Results: op rate : 11264 partition rate: 11264 row rate : 11264 latency mean : 0.7 latency median: 0.4 latency 95th percentile : 0.9 latency 99th percentile : 1.6 latency 99.9th percentile : 5.3 latency max : 325.3 Total operation time : 00:02:40 cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read Running with 8 threadCount Results: op rate : 13702 partition rate: 13702 row rate : 13702 latency mean : 0.5 latency median: 0.5 latency 95th percentile : 0.8 latency 99th percentile : 1.4 latency 99.9th percentile : 3.4 latency max : 67.1 Total operation time : 00:00:30 --- -- Home : Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk. matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write Running with 8 threadCount Results: op rate : 25181 partition rate: 25181 row rate : 25181 latency mean : 0.3 latency median: 0.2 latency 95th percentile : 0.3 latency 99th percentile : 0.5 latency 99.9th percentile : 16.7 latency max : 331.0 Total operation time : 00:03:24 gauss:~/installed/cassandra/tools/bin ./cassandra-stress read Results: op rate : 35338 partition rate: 35338 row rate : 35338 latency mean : 0.2 latency median: 0.2 latency 95th percentile : 0.3 latency 99th percentile : 0.4 latency 99.9th percentile : 1.1 latency max : 17.7 Total operation time : 00:00:30 Is the above result expected ? Thanks in advance for any suggestions ! Shing -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile
Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive
Hi, I have run cassandra-stress write and cassandra-stress read on my office PC and on my home PC. Office PC : Intel Core i7-4479, 8 virtual core, 16G RAM, 500G SSD Home PC : Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk. From the cassandra-stress result (please see below), it seems Cassandra ismore than 100% performant on my home PC than the office PC. I am expecting the other way around, as my office PC has much better hardware. Office : Intel Core i7-4479, 9 virtual cores, 16G RAM, 500G SSDcauchy:~/installed/cassandra/tools/bin ./cassandra-stress write Running with 8 threadCount Results: op rate : 11264 partition rate: 11264 row rate : 11264 latency mean : 0.7 latency median: 0.4 latency 95th percentile : 0.9 latency 99th percentile : 1.6 latency 99.9th percentile : 5.3 latency max : 325.3 Total operation time : 00:02:40 cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read Running with 8 threadCount Results: op rate : 13702 partition rate: 13702 row rate : 13702 latency mean : 0.5 latency median: 0.5 latency 95th percentile : 0.8 latency 99th percentile : 1.4 latency 99.9th percentile : 3.4 latency max : 67.1 Total operation time : 00:00:30 --- -- Home: Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk. matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write Running with 8 threadCount Results: op rate : 25181 partition rate: 25181 row rate : 25181 latency mean : 0.3 latency median: 0.2 latency 95th percentile : 0.3 latency 99th percentile : 0.5 latency 99.9th percentile : 16.7 latency max : 331.0 Total operation time : 00:03:24 gauss:~/installed/cassandra/tools/bin ./cassandra-stress read Results: op rate : 35338 partition rate: 35338 row rate : 35338 latency mean : 0.2 latency median: 0.2 latency 95th percentile : 0.3 latency 99th percentile : 0.4 latency 99.9th percentile : 1.1 latency max : 17.7 Total operation time : 00:00:30 Is the above result expected ? Thanks in advance for any suggestions ! Shing
Cassandra 2.0.5 : *-jb-27-Data.db (No such file or directory)
Hi, I am running Cassandra 2.0.5 on my PC (with just one node and the default cassandra.yaml). I have inserted one million rows into a column family (each row has a int key, two small setstring columns.) In cqlsh, when I did a select count cqlsh:testks select count(*) from ips_table limit 200; I got the following exception : ERROR 16:04:05,657 Exception in thread Thread[ReadStage:66,5,main] java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db (No such file or directory) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db (No such file or directory) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59) at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1362) at org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:67) at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1147) at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:69) at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1599) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1718) at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:111) at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1418) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931) ... 3 more Are there some Cassandra parameters I could set to get ride of the above exception ? Thanks in advance for any assistance ! Shing