Matthias Mullie has uploaded a new change for review. Change subject: DataModel ......................................................................
DataModel This is currently an over-engineered piece of code: for the purpose of AFTv5 & after the disappearance of RDBStore, a simpler solution could be implemented. DataModel.php will allow data to be written to/read from a backend (e.g. MySQL), while caching the results with BagOStuff (e.g. Memcached, Redis, ...). This cache was especially important if the backend was to be sharded; where fetching a number of entries could theoretically mean that 10 servers would need to be queried. DataModel.php can be easily extended (as DataModelSample.php does) for a specific implementation. Basically, these methods can then be used to get/manipulate the data: ::get( $id, $shard ) Will return an object (e.g. DataModelSample) that represents all data for the id requested ::getList( $name, $shard = null, $offset = 0, $sort = null, $order = 'ASC' ) Will return an instance of DataModelList (implements Iterator) that contains objects (e.g. DataModelSample) for a certain "list" (more info on lists later) ::getCount( $name, $shard = null ) Will return the amount (integer) of entries in a certain list ->insert() Will insert the data of a newly created object (e.g. DataModelSample) into the DB (and temporarily cache) ->update() Will update altered data of an existing object in the DB (and cache) ->delete() Will delete DB & cache data Lists (I promised to get back to this): Performing a selection query (e.g. "... WHERE visible = 1 ORDER BY title DESC") is quite a PITA when the data is possibly spread over multiple sharded DB's (this is why, now that RDBStore has been abandoned, this is a bit over-engineered). Instead, a specific DataModel implementation (e.g. DataModelSample, which extends DataModel) should define "public static $lists = array( <data-here> )". The array can contain multiple entries (the key is the "list name"; the first parameter to ::getList) that each contain one or multiple conditions(the WHERE alternative). Sorts are defined through ::$sort (the ORDER BY alternative). Upon saving (->insert, ->update or ->delete) data, the data of the entry will be re-evaluated to all these conditions and sorts to update the list totals (which are expensive to fetch from DB) and purge caches (only when data is changed). When e.g. offset=0 limit=25 is requested, it will pre-fetch more data and cache it right away, reducing potential follow-up queries. static $table, $idColumn, $shardColumn: These static vars should be set in the extending class (e.g. DataModelSample.php) for the DataModel code to know which value the key should be sharded over (well, not anymore), which table to write the data to and which unique id to save for the lists. Backend: So, initially, RDBStore stuff was built into DataModel.php. I have now pulled it out of there, into DataModelBackend.RDBStore.php. DataModel now has a method ::getBackend() that will return an object (depending on $wgDataModelBackendClass) that extends from DataModelBackend.php (e.g. DataModelBackend.RDBStore.php or DataModelBackend.LBFactory.php). The *.RDBStore.php one will obviously not work. *.LBFactory.php is the alternative to make it all work on one single DB using wfGetDB(). I'm not yet entirely satisfied with how the DataModel and DataModelBackend classes are currently tied together - suggestions to improve are welcome (for other parts of the code as well, obviously) In theory, if some day an alternative to RDBStore to shard data over multiple DB's pops up, a new class extending DataModelBackend.php could be written and everything else should still work just fine. BagOStuff: The cache-aspect was originally more important: data sharded over multiple servers, could mean multiple queries to multiple servers to fetch all data we need. Now that all data will be on 1 machine, it'd be less of a problem to have to fetch all raw data from DB. That said, the code is there already & will still greatly help reduce reads to the DB. Data to be cached, per "type": 'get': all data for ::get() is cached for an hour every time data is requested. This will ensure that "popular" entries are in cache pretty much all the time, while old & neglected entries do not occupy a cache slot when it's only rarely requested. 'getList': all data for ::getList() is cached for an hour. Same reason as for "get" (e.g. for AFT, the list of "deleted" items is only visible to oversighters, who will probably not view it all that often, let alone the latest 50 entries of it..) 'getListValidity': lists are saved to cache in several smaller chunks (for an hour); if data is added or updated to a list, it's cache should be purged & instead of looping and purging all chunks, this will save the purge date & a chunk's cache will not be purged until it is actually requested. 'getCount': for all list/shard combinations, the amount of matching entries (integer) will be stored indefinitely. It's just an int so won't consume much memory, and the alternative is a slightly more expensive "SELECT COUNT(*) FROM ..." query. 'generateId': will briefly save a value when generating a new id, to ensure that the id is unique and no 2 the same are generated at the same time. DataModelList: This is just a simple Iterator (extends from FakeResultWrapper) that will allow all entries from the requested list to be traversed. It also adds 2 new methods: ->hasMore(): will return true/false as an indicator of whether or not there is additional data to be fetched after the requested chunk ->nextoffset(): will return the value that should be used as $offset to fetch the next chunk Change-Id: I9cdc523dad52943296924bb99a8a68ebe7afda1e --- A DataModel.class.php A DataModel.i18n.php A DataModel.php A DataModelBackend.LBFactory.php A DataModelBackend.php A DataModelList.php A maintenance/DataModelPurgeCache.php A sample/DataModelSample.php A sample/sql/datamodel_sample.sql A tests/DataModelSampleTest.php 10 files changed, 1,720 insertions(+), 0 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/DataModel refs/changes/53/42953/1 -- To view, visit https://gerrit.wikimedia.org/r/42953 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9cdc523dad52943296924bb99a8a68ebe7afda1e Gerrit-PatchSet: 1 Gerrit-Project: mediawiki/extensions/DataModel Gerrit-Branch: master Gerrit-Owner: Matthias Mullie <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
