[
https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175841#comment-13175841
]
Pavel Yaskevich commented on CASSANDRA-3623:
--------------------------------------------
bq. Mean while your claim here is that snappy library is taking more CPU
because we give it DirectBB?
First of all I don't claim that it takes more CPU, I claim that it takes longer
time to decompress data comparing to normal reads. Second, I don't think it's a
problem with direct BB itself (btw, there is no way you can pass not direct
buffer) but instead with mmap'ed I/O in that case.
bq. Can you plz conform you tried v2 and gives a worse performance than trunk
and it is Linux (v1 doesn't give a better performance gains where as v2 does)?
Yes I tried v2 and it wasn't easy because first of all it wasn't rebased, then
I figured out that I needed to apply CASSANDRA-3611 and change call to
FBUtilities.newCRC32() to "new CRC32()" for it to compile, after that I added
"disk_access_mode: mmap" to the conf/cassandra.yaml and I used stress
"./bin/stress -n 300000 -S 512 -I SnappyCompressor" to insert test data (which
don't fit into page cache) and tried to read with "./bin/stress -n 300000 -I
SnappyCompressor -o read" but got the following exceptions:
{code}
java.lang.RuntimeException: java.lang.UnsupportedOperationException
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.UnsupportedOperationException
at
org.apache.cassandra.io.compress.CompressedMappedFileDataInput.mark(CompressedMappedFileDataInput.java:212)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:62)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:90)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:66)
at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:232)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136)
at org.apache.cassandra.db.Table.getRow(Table.java:375)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279)
... 3 more
{code}
and
{code}
ava.lang.RuntimeException: java.lang.UnsupportedOperationException
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.UnsupportedOperationException
at
org.apache.cassandra.io.compress.CompressedMappedFileDataInput.reset(CompressedMappedFileDataInput.java:207)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:78)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:107)
at
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:88)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:47)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:137)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:246)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136)
at org.apache.cassandra.db.Table.getRow(Table.java:375)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279)
... 3 more
{code}
After I managed to implement mark()/reset() methods I got the following
results: current trunk 67 sec and your patch 101 sec to run read on 300000
rows. I have tested everything on the server without any interference network
and it seems that my results are clearer from side effects than yours. I'm
still not convinced that mmap'ed I/O is better for compressed data than
syscalls and I know that it has side effects that we can't control from java
(mentioned above) so I'm waiting for convincing results or we should close this
ticket...
> use MMapedBuffer in CompressedSegmentedFile.getSegment
> ------------------------------------------------------
>
> Key: CASSANDRA-3623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.1
> Reporter: Vijay
> Assignee: Vijay
> Labels: compression
> Fix For: 1.1
>
> Attachments: 0001-MMaped-Compression-segmented-file-v2.patch,
> 0001-MMaped-Compression-segmented-file.patch,
> 0002-tests-for-MMaped-Compression-segmented-file-v2.patch
>
>
> CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to
> use the MMap and hence a higher CPU on the nodes and higher latencies on
> reads.
> This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
> // TODO refactor this to separate concept of "buffer to avoid lots of read()
> syscalls" and "compression buffer"
> but i think a separate class for the Buffer will be better.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira