Matthias Mullie has uploaded a new change for review.

Change subject: DataModel
......................................................................

DataModel

This is currently an over-engineered piece of code: for the purpose of AFTv5 & 
after the disappearance of RDBStore, a simpler solution could be implemented.

DataModel.php will allow data to be written to/read from a backend (e.g. 
MySQL), while caching the results with BagOStuff (e.g. Memcached, Redis, ...). 
This cache was especially important if the backend was to be sharded; where 
fetching a number of entries could theoretically mean that 10 servers would 
need to be queried.

DataModel.php can be easily extended (as DataModelSample.php does) for a 
specific implementation. Basically, these methods can then be used to 
get/manipulate the data:

::get( $id, $shard )
Will return an object (e.g. DataModelSample) that represents all data for the 
id requested
::getList( $name, $shard = null, $offset = 0, $sort = null, $order = 'ASC' )
Will return an instance of DataModelList (implements Iterator) that contains 
objects (e.g. DataModelSample) for a certain "list" (more info on lists later)
::getCount( $name, $shard = null )
Will return the amount (integer) of entries in a certain list
->insert()
Will insert the data of a newly created object (e.g. DataModelSample) into the 
DB (and temporarily cache)
->update()
Will update altered data of an existing object in the DB (and cache)
->delete()
Will delete DB & cache data

Lists (I promised to get back to this):
Performing a selection query (e.g. "... WHERE visible = 1 ORDER BY title DESC") 
is quite a PITA when the data is possibly spread over multiple sharded DB's 
(this is why, now that RDBStore has been abandoned, this is a bit 
over-engineered). Instead, a specific DataModel implementation (e.g. 
DataModelSample, which extends DataModel) should define "public static $lists = 
array( <data-here> )".
The array can contain multiple entries (the key is the "list name"; the first 
parameter to ::getList) that each contain one or multiple conditions(the WHERE 
alternative). Sorts are defined through ::$sort (the ORDER BY alternative). 
Upon saving (->insert, ->update or ->delete) data, the data of the entry will 
be re-evaluated to all these conditions and sorts to update the list totals 
(which are expensive to fetch from DB) and purge caches (only when data is 
changed).
When e.g. offset=0 limit=25 is requested, it will pre-fetch more data and cache 
it right away, reducing potential follow-up queries.

static $table, $idColumn, $shardColumn:
These static vars should be set in the extending class (e.g. 
DataModelSample.php) for the DataModel code to know which value the key should 
be sharded over (well, not anymore), which table to write the data to and which 
unique id to save for the lists.

Backend:
So, initially, RDBStore stuff was built into DataModel.php. I have now pulled 
it out of there, into DataModelBackend.RDBStore.php. DataModel now has a method 
::getBackend() that will return an object (depending on 
$wgDataModelBackendClass) that extends from DataModelBackend.php (e.g. 
DataModelBackend.RDBStore.php or DataModelBackend.LBFactory.php). The 
*.RDBStore.php one will obviously not work. *.LBFactory.php is the alternative 
to make it all work on one single DB using wfGetDB().
I'm not yet entirely satisfied with how the DataModel and DataModelBackend 
classes are currently tied together - suggestions to improve are welcome (for 
other parts of the code as well, obviously)
In theory, if some day an alternative to RDBStore to shard data over multiple 
DB's pops up, a new class extending DataModelBackend.php could be written and 
everything else should still work just fine.

BagOStuff:
The cache-aspect was originally more important: data sharded over multiple 
servers, could mean multiple queries to multiple servers to fetch all data we 
need. Now that all data will be on 1 machine, it'd be less of a problem to have 
to fetch all raw data from DB. That said, the code is there already & will 
still greatly help reduce reads to the DB.
Data to be cached, per "type":
'get': all data for ::get() is cached for an hour every time data is requested. 
This will ensure that "popular" entries are in cache pretty much all the time, 
while old & neglected entries do not occupy a cache slot when it's only rarely 
requested.
'getList': all data for ::getList() is cached for an hour. Same reason as for 
"get" (e.g. for AFT, the list of "deleted" items is only visible to 
oversighters, who will probably not view it all that often, let alone the 
latest 50 entries of it..)
'getListValidity': lists are saved to cache in several smaller chunks (for an 
hour); if data is added or updated to a list, it's cache should be purged & 
instead of looping and purging all chunks, this will save the purge date & a 
chunk's cache will not be purged until it is actually requested.
'getCount': for all list/shard combinations, the amount of matching entries 
(integer) will be stored indefinitely. It's just an int so won't consume much 
memory, and the alternative is a slightly more expensive "SELECT COUNT(*) FROM 
..." query.
'generateId': will briefly save a value when generating a new id, to ensure 
that the id is unique and no 2 the same are generated at the same time.

DataModelList:
This is just a simple Iterator (extends from FakeResultWrapper) that will allow 
all entries from the requested list to be traversed. It also adds 2 new methods:
->hasMore(): will return true/false as an indicator of whether or not there is 
additional data to be fetched after the requested chunk
->nextoffset(): will return the value that should be used as $offset to fetch 
the next chunk

Change-Id: I9cdc523dad52943296924bb99a8a68ebe7afda1e
---
A DataModel.class.php
A DataModel.i18n.php
A DataModel.php
A DataModelBackend.LBFactory.php
A DataModelBackend.php
A DataModelList.php
A maintenance/DataModelPurgeCache.php
A sample/DataModelSample.php
A sample/sql/datamodel_sample.sql
A tests/DataModelSampleTest.php
10 files changed, 1,720 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/DataModel 
refs/changes/53/42953/1
--
To view, visit https://gerrit.wikimedia.org/r/42953
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9cdc523dad52943296924bb99a8a68ebe7afda1e
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/DataModel
Gerrit-Branch: master
Gerrit-Owner: Matthias Mullie <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to