[ 
https://issues.apache.org/jira/browse/CASSANDRA-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875727#action_12875727
 ] 

Matthew F. Dennis commented on CASSANDRA-1046:
----------------------------------------------

my profiler (not sure I trust it at the moment) showed different things (and at 
no point was I able to get timeouts in the client, even using numbers an order 
of magnitude higher than originally reported).

So, I created some scripts to help test this (still didn't get client timeouts 
- perhaps because of the UUID changes previously made). Inserting prints time 
UUIDs for the start, ~middle and end of what was inserted. These can be fed 
into the reader to start from the middle and read the specified number of 
columns out. I was running these scripts by piping the insertator output to tee 
uuids and calling the readarator with `cat uuids`.

On my laptop these changes reduced the run time of the scripts from about 2.5 
minutes to less than 15 seconds (with reversed slices taking a couple seconds 
more in total).

In addition, I reviewed the callers of ColumnFamily.getSortedColumns (I did not 
review any test classes). Everything was already iterating. In particular:

{code}
SSTableExport.SerializeRow already iterates
[avro|thrift].CassandraServer
  .thriftifyColumns already iterates
  .thriftify[Super]Columns already iterates
Migration.getLocalMigrations already iterates
SSTableNameIterator.<init> only creates an iterator for later use
QueryFilter.getRuduced only create an iterator and then calls next()
Table.load already iterates
HintedHandoffManager
  .pagingFinished just calls size
  .deliverHintsToEndpoint already iterates
  .deliverAllHints already iterates
DefsTable.loadFromStorage already iterates
CompactionManager.submitGraveyardCleanup already iterates
ColumnIndexer.seralize already iterates
ColumnFamilySerializer.serializeForSSTable already iterates
ColumnFamily
  .toString already iterates
  .addAll already iterates 
{code}

> optimize Memtable.getSliceIterator
> ----------------------------------
>
>                 Key: CASSANDRA-1046
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1046
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>
> As reported by James Golick, about 30% of the time in a read is spent in 
> SliceQueryFilter.getMemColumnIterator, virtually all of which is in 
> ConcurrentSkipListMap$Values.toArrray().
> I wrote on the ML:
> Besides the UUID optimization you posted, we should do an audit of 
> ColumnFamily.getSortedColumns and replace with iteration where possible (in 
> this case, we'd be left with one copy of most of the columns, but that's 
> better than two).
> We can get rid of the other copy by fixing the logic in 
> Memtable.getSliceIterator, which says "copy all the columns, so we can do a 
> binary search on them to find where to start," but since columns are natively 
> in sorted order we could just use an iterator and a while loo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to