[jira] [Updated] (CASSANDRA-2843) better performance on long row read

Sylvain Lebresne (JIRA) Mon, 25 Jul 2011 08:48:35 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-2843:
----------------------------------------

    Attachment: 2843_e.patch

Attaching rebased and update patch.

This patch fixes reversed slices. Turns out previous patch wasn't handling them 
correctly. This is also a little bit more annoying to do that one would hope 
because the code assumes that the ColumnFamily object is itself sorted in 
forward sorted order but the insertion are made in reverse sorted order 
(changing that is much more involved than it appears unfortunately), so it's 
not as simple as feeding a reverse comparator to the map. So the patch takes an 
"hint" at the construction of the ColumnFamily, indicating how the insert will 
be done, but it does not influence the sorting of the cf itself.  Internally, 
AL keeps the elements in reverse sorted order as one could expect, but its 
iterator still have to return elements in sorted order, so there is a bit of 
boilerplate involved.

bq. Actually I just realized that there is one place where we do add a column 
after a read not at the end of the CF.

I was a little bit quick on that, the aforementioned place actually uses 
addAll(). There is no real efficiency problem with addAll() however (it does a 
merge). So the patch actually adds an assert in add() that the added column is 
in sorted order.
I found another place though where we did do a add() not in sorted order. This 
is due to a case where we actually want to replace a column, and we end up 
removing then adding. I've added a replace method to handle that case instead 
since anyway it's "the right thing to do". Note that this example was actually 
on the compaction path (for counters), not the read one, but it feels better to 
fix it now as it is related.

bq. Is there a reason we can't a test for that path?

Not really. I've attached such a test to CASSANDRA-2945.

The patch also fix the code style in the added test, add a few more ones and 
add a bunch of comments.

All the tests (unit and system) are passing (that is, excluding the unrelated 
ones that happens to fail in trunk right now).


> better performance on long row read
> -----------------------------------
>
>                 Key: CASSANDRA-2843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Yang Yang
>         Attachments: 2843.patch, 2843_d.patch, 2843_e.patch, 
> microBenchmark.patch, patch_timing, std_timing
>
>
> currently if a row contains > 1000 columns, the run time becomes considerably 
> slow (my test of 
> a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
> 40 bytes in value, is about 16ms.
> this is all running in memory, no disk read is involved.
> through debugging we can find
> most of this time is spent on 
> [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
> ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
> int, ColumnFamily)
> [Wall Time]  
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
>  Iterator, int)
> [Wall Time]  
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
>  Iterator, int)
> [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
> ColumnFamily.addColumn() is slow because it inserts into an internal 
> concurrentSkipListMap() that maps column names to values.
> this structure is slow for two reasons: it needs to do synchronization; it 
> needs to maintain a more complex structure of map.
> but if we look at the whole read path, thrift already defines the read output 
> to be List<ColumnOrSuperColumn> so it does not make sense to use a luxury map 
> data structure in the interium and finally convert it to a list. on the 
> synchronization side, since the return CF is never going to be 
> shared/modified by other threads, we know the access is always single thread, 
> so no synchronization is needed.
> but these 2 features are indeed needed for ColumnFamily in other cases, 
> particularly write. so we can provide a different ColumnFamily to 
> CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
> creates the standard ColumnFamily, but take a provided returnCF, whose cost 
> is much cheaper.
> the provided patch is for demonstration now, will work further once we agree 
> on the general direction. 
> CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
> provided. the main work is to let the FastColumnFamily use an array  for 
> internal storage. at first I used binary search to insert new columns in 
> addColumn(), but later I found that even this is not necessary, since all 
> calling scenarios of ColumnFamily.addColumn() has an invariant that the 
> inserted columns come in sorted order (I still have an issue to resolve 
> descending or ascending  now, but ascending works). so the current logic is 
> simply to compare the new column against the end column in the array, if 
> names not equal, append, if equal, reconcile.
> slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
> flavors of the method, one accepting a returnCF. but we could definitely 
> think about what is the better way to provide this returnCF.
> this patch compiles fine, no tests are provided yet. but I tested it in my 
> application, and the performance improvement is dramatic: it offers about 50% 
> reduction in read time in the 3000-column case.
> thanks
> Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read

Reply via email to