[ 
https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136991#comment-13136991
 ] 

Brandon Williams commented on CASSANDRA-3248:
---------------------------------------------

ext3 w/fsync:
{noformat}

Operations performed:  0 Read, 262144 Write, 262144 Other = 524288 Total
Read 0b  Written 512Mb  Total transferred 512Mb  (9.6049Mb/sec)
 4917.70 Requests/sec executed

Test execution summary:
    total time:                          53.3062s
    total number of events:              262144
    total time taken by event execution: 53.1314
    per-request statistics:
         min:                                  0.13ms
         avg:                                  0.20ms
         max:                                 50.49ms
         approx.  95 percentile:               0.21ms
{noformat}

ext3 w/fdatasync:
{noformat}
Operations performed:  0 Read, 262144 Write, 262144 Other = 524288 Total
Read 0b  Written 512Mb  Total transferred 512Mb  (9.6357Mb/sec)
 4933.49 Requests/sec executed

Test execution summary:
    total time:                          53.1356s
    total number of events:              262144
    total time taken by event execution: 52.9635
    per-request statistics:
         min:                                  0.13ms
         avg:                                  0.20ms
         max:                                 67.45ms
         approx.  95 percentile:               0.21ms

Threads fairness:
    events (avg/stddev):           262144.0000/0.00
    execution time (avg/stddev):   52.9635/0.00
{noformat}

xfs w/fsync:
{noformat}
Operations performed:  0 Read, 262144 Write, 262144 Other = 524288 Total
Read 0b  Written 512Mb  Total transferred 512Mb  (10.406Mb/sec)
 5327.67 Requests/sec executed

Test execution summary:
    total time:                          49.2043s
    total number of events:              262144
    total time taken by event execution: 49.0501
    per-request statistics:
         min:                                  0.12ms
         avg:                                  0.19ms
         max:                                 31.80ms
         approx.  95 percentile:               0.26ms

Threads fairness:
    events (avg/stddev):           262144.0000/0.00
    execution time (avg/stddev):   49.0501/0.00
{noformat}

xfs w/fdatasync:
{noformat}
Operations performed:  0 Read, 262144 Write, 262144 Other = 524288 Total
Read 0b  Written 512Mb  Total transferred 512Mb  (10.387Mb/sec)
 5317.93 Requests/sec executed

Test execution summary:
    total time:                          49.2944s
    total number of events:              262144
    total time taken by event execution: 49.1413
    per-request statistics:
         min:                                  0.13ms
         avg:                                  0.19ms
         max:                                 30.86ms
         approx.  95 percentile:               0.26ms

Threads fairness:
    events (avg/stddev):           262144.0000/0.00
    execution time (avg/stddev):   49.1413/0.00
{noformat}
                
> CommitLog writer should call fdatasync instead of fsync
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3248
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3248
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1
>         Environment: Linux
>            Reporter: Zhu Han
>            Assignee: Brandon Williams
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> CommitLogSegment uses SequentialWriter to flush the buffered data to log 
> device. It depends on FileDescriptor#sync() which invokes fsync() as it force 
> the file attributes to disk.
> However, at least on Linux, fdatasync() is good enough for commit log flush:
> bq. fdatasync() is similar to fsync(), but does not flush modified metadata 
> unless that metadata is needed in order to allow a subsequent data retrieval 
> to be  correctly handled.  For example, changes to st_atime or st_mtime 
> (respectively, time of last access and time of last modification; see 
> stat(2)) do not require flushing because they are not necessary for a 
> subsequent data read to be handled correctly.  On the other hand, a change to 
> the file size (st_size,  as  made  by  say  ftruncate(2)),  would require a 
> metadata flush.
> File size is synced to disk by fdatasync() either. Although the commit log 
> recovery logic sorts the commit log segements on their modify timestamp, it 
> can be removed safely, IMHO.
> I checked the native code of JRE 6. On Linux and Solaris, 
> FileChannel#force(false) invokes fdatasync(). On windows, the false flag does 
> not have any impact.
> On my log device (commodity SATA HDD, write cache disabled), there is large 
> performance gap between fsync() and fdatasync():
> {quote}
> $sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G 
> --file-fsync-all=on --file-fsync-mode={color:red}fdatasync{color} 
> --file-test-mode=seqwr --max-time=600 --file-block-size=2K  --max-requests=0 
> run
> {color:blue}54.90{color} Requests/sec executed
>    per-request statistics:
>          min:                                  8.29ms
>          avg:                                 18.18ms
>          max:                                108.36ms
>          approx.  95 percentile:              25.02ms
> $ sysbench --test=fileio --num-threads=1  --file-num=1 --file-total-size=10G 
> --file-fsync-all=on --file-fsync-mode={color:red}fsync{color} 
> --file-test-mode=seqwr --max-time=600 --file-block-size=2K  --max-requests=0 
> run
> {color:blue}28.08{color} Requests/sec executed
>     per-request statistics:
>          min:                                 33.28ms
>          avg:                                 35.61ms
>          max:                                911.87ms
>          approx.  95 percentile:              41.69ms
> {quote}
> I do think this is a very critical performance improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to