How to autogenerate timestamp with Spark Cassandra connector

2014-11-11 Thread Shing Hing Man
Hi,

I am trying to insert into the following column family using Spark Cassandra 
connector. 


CREATE TABLE myks.mycf (
id bigint,
msg text,
type text,
ts, timestamp,
primary key (id, msg)
) 
Is there a way to to have the ts field automatically generate : 
// dataRdd is of Type RDD[(Int,String,String)]. Would like the ts field 
automatically filled in with value from now(). 
dataRdd.saveToCassandra(keySpace, cf, SomeColumns(id, msg, type))

Do I need to implement my own RowWriterFactor ? 

Thanks in advance for any  assistance !

Shing

Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-29 Thread Shing Hing Man
I have run a sysbench  file io test on my home PC and office PC. The result is  
given below. The result shows my office PC (with a SSD) is about 3 times more 
performant than my home PC (with a sata hard disk).

Home PC :

gauss:~ sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
.
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 626.30 seconds (81.75 MB/sec).
matmsh@gauss:~ sysbench --test=fileio --file-total-size=50G 
--file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  14521 reads, 9680 writes, 30976 Other = 55177 Total
Read 226.89Mb  Written 151.25Mb  Total transferred 378.14Mb  (1.2605Mb/sec)
   80.67 Requests/sec executed

General statistics:
total time:  300.0030s
total number of events:  24201
total time taken by event execution: 186.0749s
response time:
 min:  0.00ms
 avg:  7.69ms
 max:132.43ms
 approx.  95 percentile:  19.57ms

Threads fairness:
events (avg/stddev):   24201./0.00
execution time (avg/stddev):   186.0749/0.00

gauss:~ 
===
Office PC :
shing@cauchy:~ sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
...Creating file test_file.122
Creating file test_file.123
Creating file test_file.124
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 175.55 seconds (291.66 MB/sec).
cauchy:~ sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw 
--init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  43020 reads, 28680 writes, 91723 Other = 163423 Total
Read 672.19Mb  Written 448.12Mb  Total transferred 1.0941Gb  (3.7344Mb/sec)
  239.00 Requests/sec executed

General statistics:
total time:  300.0007s
total number of events:  71700
total time taken by event execution: 7.5550s
response time:
 min:  0.00ms
 avg:  0.11ms
 max: 12.89ms
 approx.  95 percentile:   0.22ms

Threads fairness:
events (avg/stddev):   71700./0.00
execution time (avg/stddev):   7.5550/0.00
===



Shing



On Saturday, 27 September 2014, 10:24, Shing Hing Man mat...@yahoo.com wrote:
 


Hi Kevin,
   Thanks for the reply !
I do not know the exact brand of SSD in my office PC. But the SSD is  only 1 
year old,  and it is far from full. 

On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then 

run cassandra -f  with the default config,   then

run cassandra-stress 

Both PCs  have Oracle Java 1.7.0_40.

I have noticed there are some parameters for SSD in cassandra.yaml, which I 
have adjusted, but with no improvement. 


It  puzzles me Cassandra on  my office PC, with far better hardware,  could be 
100% slower than my home PC. 



Shing







On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote:
 


What SSD was it?  There are a lot of variability in terms of SSD performance.

1.  Is it a new vs old SSD?  Old SSDs can become slower if they’re really worn 
out

2.  was the office SSD near capacity holding other data?

3

Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-27 Thread Shing Hing Man
Hi Kevin,
   Thanks for the reply !
I do not know the exact brand of SSD in my office PC. But the SSD is  only 1 
year old,  and it is far from full. 

On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then 

run cassandra -f  with the default config,   then

run cassandra-stress 

Both PCs  have Oracle Java 1.7.0_40.

I have noticed there are some parameters for SSD in cassandra.yaml, which I 
have adjusted, but with no improvement. 


It  puzzles me Cassandra on  my office PC, with far better hardware,  could be 
100% slower than my home PC. 



Shing







On Saturday, 27 September 2014, 5:12, Kevin Burton bur...@spinn3r.com wrote:
 


What SSD was it?  There are a lot of variability in terms of SSD performance.

1.  Is it a new vs old SSD?  Old SSDs can become slower if they’re really worn 
out

2.  was the office SSD near capacity holding other data?

3.  what models were they?

SSD != SSD… there is a massive amount of performance variability out there.

… also … more data is needed.  JDK versions the same?  cassandra versions the 
same?

what about the config?


On Fri, Sep 26, 2014 at 2:39 PM, Shing Hing Man mat...@yahoo.com wrote:

Hi,
  I have run   cassandra-stress write and  cassandra-stress read  on my office 
 PC and on my home PC. 


Office PC : Intel Core i7-4479, 8  virtual core, 16G RAM, 500G SSD Home PC : 
Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.


From the cassandra-stress result (please see below), it seems  Cassandra is 
more than 100% performant on my home PC than the office PC.   I am expecting 
the other way around, as my office PC has much  better hardware. 



Office : Intel Core i7-4479, 9  virtual cores, 16G RAM, 500G 
SSDcauchy:~/installed/cassandra/tools/bin ./cassandra-stress write 
Running with 8 threadCount
Results:
op rate   : 11264
partition rate: 11264
row rate  : 11264
latency mean  : 0.7
latency median: 0.4
latency 95th percentile   : 0.9
latency 99th percentile   : 1.6
latency 99.9th percentile : 5.3
latency max   : 325.3
Total operation time  : 00:02:40




cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read 
Running with 8 threadCount
Results:
op rate   : 13702
partition rate: 13702
row rate  : 13702
latency mean  : 0.5
latency median: 0.5
latency 95th percentile   : 0.8
latency 99th percentile   : 1.4
latency 99.9th percentile : 3.4
latency max   : 67.1
Total operation time  : 00:00:30


---
--

Home : Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.


matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write
Running with 8 threadCount


Results:
op rate   : 25181
partition rate: 25181
row rate  : 25181
latency mean  : 0.3
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.5
latency 99.9th percentile : 16.7
latency max   : 331.0
Total operation time  : 00:03:24


gauss:~/installed/cassandra/tools/bin ./cassandra-stress read
Results:
op rate   : 35338
partition rate: 35338
row rate  : 35338
latency mean  : 0.2
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.4
latency 99.9th percentile : 1.1
latency max   : 17.7
Total operation time  : 00:00:30




Is the above result expected ?
Thanks in advance for any suggestions !


Shing







-- 

Founder/CEO Spinn3r.com

Location: San Francisco, CA

blog: http://burtonator.wordpress.com
… or check out my Google+ profile

Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-26 Thread Shing Hing Man
Hi,
  I have run   cassandra-stress write and  cassandra-stress read  on my office 
PC and on my home PC. 

Office PC : Intel Core i7-4479, 8  virtual core, 16G RAM, 500G SSD Home PC : 
Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.

From the cassandra-stress result (please see below), it seems Cassandra ismore 
than 100% performant on my home PC than the office PC. I am expecting the other 
way around, as my office PC has much  better hardware. 


Office : Intel Core i7-4479, 9  virtual cores, 16G RAM, 500G 
SSDcauchy:~/installed/cassandra/tools/bin ./cassandra-stress write 
Running with 8 threadCount
Results:
op rate   : 11264
partition rate: 11264
row rate  : 11264
latency mean  : 0.7
latency median: 0.4
latency 95th percentile   : 0.9
latency 99th percentile   : 1.6
latency 99.9th percentile : 5.3
latency max   : 325.3
Total operation time  : 00:02:40


cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read 
Running with 8 threadCount
Results:
op rate   : 13702
partition rate: 13702
row rate  : 13702
latency mean  : 0.5
latency median: 0.5
latency 95th percentile   : 0.8
latency 99th percentile   : 1.4
latency 99.9th percentile : 3.4
latency max   : 67.1
Total operation time  : 00:00:30

---
--

Home: Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.

matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write
Running with 8 threadCount

Results:
op rate   : 25181
partition rate: 25181
row rate  : 25181
latency mean  : 0.3
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.5
latency 99.9th percentile : 16.7
latency max   : 331.0
Total operation time  : 00:03:24

gauss:~/installed/cassandra/tools/bin ./cassandra-stress read
Results:
op rate   : 35338
partition rate: 35338
row rate  : 35338
latency mean  : 0.2
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.4
latency 99.9th percentile : 1.1
latency max   : 17.7
Total operation time  : 00:00:30


Is the above result expected ?
Thanks in advance for any suggestions !

Shing

Cassandra 2.0.5 : *-jb-27-Data.db (No such file or directory)

2014-09-08 Thread Shing Hing Man
Hi,
   I am running Cassandra 2.0.5 on my PC (with just one node and the default 
cassandra.yaml). 

I have inserted  one million rows into a column family (each row has a int key, 
two  small setstring columns.) 

In cqlsh,  when I did a select count

 cqlsh:testks select count(*) from ips_table limit 200;

I got the following exception :
 
ERROR 16:04:05,657 Exception in thread Thread[ReadStage:66,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.FileNotFoundException: 
/home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db
 (No such file or directory)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db
 (No such file or directory)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1362)
at org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:67)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1147)
at 
org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:69)
at 
org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1599)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1718)
at 
org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:111)
at 
org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1418)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
... 3 more


Are  there some Cassandra parameters I could set to get ride of the above 
exception ?

Thanks in advance for any assistance !

Shing