[ https://issues.apache.org/jira/browse/CASSANDRA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136991#comment-13136991 ]
Brandon Williams commented on CASSANDRA-3248: --------------------------------------------- ext3 w/fsync: {noformat} Operations performed: 0 Read, 262144 Write, 262144 Other = 524288 Total Read 0b Written 512Mb Total transferred 512Mb (9.6049Mb/sec) 4917.70 Requests/sec executed Test execution summary: total time: 53.3062s total number of events: 262144 total time taken by event execution: 53.1314 per-request statistics: min: 0.13ms avg: 0.20ms max: 50.49ms approx. 95 percentile: 0.21ms {noformat} ext3 w/fdatasync: {noformat} Operations performed: 0 Read, 262144 Write, 262144 Other = 524288 Total Read 0b Written 512Mb Total transferred 512Mb (9.6357Mb/sec) 4933.49 Requests/sec executed Test execution summary: total time: 53.1356s total number of events: 262144 total time taken by event execution: 52.9635 per-request statistics: min: 0.13ms avg: 0.20ms max: 67.45ms approx. 95 percentile: 0.21ms Threads fairness: events (avg/stddev): 262144.0000/0.00 execution time (avg/stddev): 52.9635/0.00 {noformat} xfs w/fsync: {noformat} Operations performed: 0 Read, 262144 Write, 262144 Other = 524288 Total Read 0b Written 512Mb Total transferred 512Mb (10.406Mb/sec) 5327.67 Requests/sec executed Test execution summary: total time: 49.2043s total number of events: 262144 total time taken by event execution: 49.0501 per-request statistics: min: 0.12ms avg: 0.19ms max: 31.80ms approx. 95 percentile: 0.26ms Threads fairness: events (avg/stddev): 262144.0000/0.00 execution time (avg/stddev): 49.0501/0.00 {noformat} xfs w/fdatasync: {noformat} Operations performed: 0 Read, 262144 Write, 262144 Other = 524288 Total Read 0b Written 512Mb Total transferred 512Mb (10.387Mb/sec) 5317.93 Requests/sec executed Test execution summary: total time: 49.2944s total number of events: 262144 total time taken by event execution: 49.1413 per-request statistics: min: 0.13ms avg: 0.19ms max: 30.86ms approx. 95 percentile: 0.26ms Threads fairness: events (avg/stddev): 262144.0000/0.00 execution time (avg/stddev): 49.1413/0.00 {noformat} > CommitLog writer should call fdatasync instead of fsync > ------------------------------------------------------- > > Key: CASSANDRA-3248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3248 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.6.13, 0.7.9, 0.8.6, 1.0.0, 1.1 > Environment: Linux > Reporter: Zhu Han > Assignee: Brandon Williams > Original Estimate: 48h > Remaining Estimate: 48h > > CommitLogSegment uses SequentialWriter to flush the buffered data to log > device. It depends on FileDescriptor#sync() which invokes fsync() as it force > the file attributes to disk. > However, at least on Linux, fdatasync() is good enough for commit log flush: > bq. fdatasync() is similar to fsync(), but does not flush modified metadata > unless that metadata is needed in order to allow a subsequent data retrieval > to be correctly handled. For example, changes to st_atime or st_mtime > (respectively, time of last access and time of last modification; see > stat(2)) do not require flushing because they are not necessary for a > subsequent data read to be handled correctly. On the other hand, a change to > the file size (st_size, as made by say ftruncate(2)), would require a > metadata flush. > File size is synced to disk by fdatasync() either. Although the commit log > recovery logic sorts the commit log segements on their modify timestamp, it > can be removed safely, IMHO. > I checked the native code of JRE 6. On Linux and Solaris, > FileChannel#force(false) invokes fdatasync(). On windows, the false flag does > not have any impact. > On my log device (commodity SATA HDD, write cache disabled), there is large > performance gap between fsync() and fdatasync(): > {quote} > $sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G > --file-fsync-all=on --file-fsync-mode={color:red}fdatasync{color} > --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 > run > {color:blue}54.90{color} Requests/sec executed > per-request statistics: > min: 8.29ms > avg: 18.18ms > max: 108.36ms > approx. 95 percentile: 25.02ms > $ sysbench --test=fileio --num-threads=1 --file-num=1 --file-total-size=10G > --file-fsync-all=on --file-fsync-mode={color:red}fsync{color} > --file-test-mode=seqwr --max-time=600 --file-block-size=2K --max-requests=0 > run > {color:blue}28.08{color} Requests/sec executed > per-request statistics: > min: 33.28ms > avg: 35.61ms > max: 911.87ms > approx. 95 percentile: 41.69ms > {quote} > I do think this is a very critical performance improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira