[jira] [Commented] (CASSANDRA-2843) better performance on long row read

Sylvain Lebresne (JIRA) Wed, 20 Jul 2011 01:09:31 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068212#comment-13068212
 ]


Sylvain Lebresne commented on CASSANDRA-2843:
---------------------------------------------

bq. Doesn't it make sense then to change the AL fallback-to-bsearch into an 
assertion failure?

I kind of have a small preference for not turning it into a failure, but if we 
don't see a use case where we would want to use AL even though not all input 
are sorted in the near future, I could be convinced to doing it for now. There 
is one such use case though that I think could be reasonable, and that's the 
new SSTableSimple*Writer classes introduced by CASSANDRA-2911. The thing is, 
you need to have quite a bit of input out of order for AL to start being slower 
than the alternatives (even a TreeMap one). I tried a modified version of my 
micro-benchmark adding columns in reverse sorted order (worst case for AL), and 
on my machine, you need to add 1000 columns to start seeing AL slower than the 
others. Which leads me to believe that AL could be a good default for the CF 
used in SSTableSimple*Writer.

bq. I see a typo in Sylvain's patch, and added a comment on 
ArrayBackedColumnMap: its addAll() does assume sorted input

AddAll does assume sorted input, but that's because IColumnMap is meant to be a 
sorted map (IColumnMap doesn't extend SortedMap for technical reason, but it is 
really meant as a SortedMap). That is, I agree that comments could be improved, 
but I think that it is the IColumnMap one that needs to be fixed by specifying 
that the iterator should iter in sorted order. Anyway, that cosmetic and could 
be changed during commit if not before.

bq. AbstractColumnContainer.DeletionInfo has to be protected, otherwise eclipse 
gives a compile error

Is there another reason than "otherwise eclipse gives a compile error". Because 
it doesn't seem very related to this patch and I'm keen on not changing stuff 
randomly without understanding why.

bq.  in the 95th percentile there was an approximate 10% gain across the board

Great, thanks Brandon for the testing.

> better performance on long row read
> -----------------------------------
>
>                 Key: CASSANDRA-2843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Yang Yang
>         Attachments: 2843.patch, 2843_c.patch, fast_cf_081_trunk.diff, 
> incremental.diff, microBenchmark.patch
>
>
> currently if a row contains > 1000 columns, the run time becomes considerably 
> slow (my test of 
> a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
> 40 bytes in value, is about 16ms.
> this is all running in memory, no disk read is involved.
> through debugging we can find
> most of this time is spent on 
> [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
> int, ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
>  Iterator, int)
> [Wall Time]  
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
>  Iterator, int)
> [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
> ColumnFamily.addColumn() is slow because it inserts into an internal 
> concurrentSkipListMap() that maps column names to values.
> this structure is slow for two reasons: it needs to do synchronization; it 
> needs to maintain a more complex structure of map.
> but if we look at the whole read path, thrift already defines the read output 
> to be List<ColumnOrSuperColumn> so it does not make sense to use a luxury map 
> data structure in the interium and finally convert it to a list. on the 
> synchronization side, since the return CF is never going to be 
> shared/modified by other threads, we know the access is always single thread, 
> so no synchronization is needed.
> but these 2 features are indeed needed for ColumnFamily in other cases, 
> particularly write. so we can provide a different ColumnFamily to 
> CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
> creates the standard ColumnFamily, but take a provided returnCF, whose cost 
> is much cheaper.
> the provided patch is for demonstration now, will work further once we agree 
> on the general direction. 
> CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
> provided. the main work is to let the FastColumnFamily use an array  for 
> internal storage. at first I used binary search to insert new columns in 
> addColumn(), but later I found that even this is not necessary, since all 
> calling scenarios of ColumnFamily.addColumn() has an invariant that the 
> inserted columns come in sorted order (I still have an issue to resolve 
> descending or ascending  now, but ascending works). so the current logic is 
> simply to compare the new column against the end column in the array, if 
> names not equal, append, if equal, reconcile.
> slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
> flavors of the method, one accepting a returnCF. but we could definitely 
> think about what is the better way to provide this returnCF.
> this patch compiles fine, no tests are provided yet. but I tested it in my 
> application, and the performance improvement is dramatic: it offers about 50% 
> reduction in read time in the 3000-column case.
> thanks
> Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2843) better performance on long row read

Reply via email to