Re: [Koha-devel] RFC: Koha::Persistent - for plack and general performance

Dobrica Pavlinusic Fri, 06 Apr 2012 07:49:20 -0700

I will try to include answers to all questions which where addressed to
me in thread in this single message. If I missed something, sorry :-)


I played around the code a bit more (and plan to spend a few more days on it
to have some idea how much of the *measurable* change it will make), but I
will try to simplify my proposal a bit:

My goal are short-term, one release cycle as opposed to long-term
refactor (with which I agree)

1) existing code accessors cleanup

Existing code already has accessors which can be reused and memoized,
so I started submitting cleanups on on existing code which make use of
those as preparation for memoize. I'm hoping that those patches are
nice code cleanups which will help code maintenance along with primary
goal of providing memoizable functions in single place.

See: (I'm including github links because it looks even scarier with color diff)
https://github.com/dpavlin/Koha/commit/eb1102ab3c096b80663b878a378741c08369fb6c
https://github.com/dpavlin/Koha/commit/ae926c02fe1e4b579ff42f880681bc30692602c3

Q: my current strategy is to create one commit per changed function. I'm
planning to attach all of them to Bug 7872, is that OK? I don't
think that test scenarios are best way to test this changes (since most
of them might be "run any code that uses this function", so what can I do to
make SO and QA easier?

>From what I can see, they will touch C4::Items and C4::Biblio (for which
I will open separate bug)

2) smarter memoize with per-request and persistent cache

Existing memoize does not have correct cache invalidation under plack
(--max-requests is gross hack and I would recommend running only
anonymous OPAC with that ;-) so I will write alternative memoize which
will call unmemoize on each request. This mimics CGI, provides
performance improvement and should be safe to run.

Existing memoize_memcached functions will become memoize_persistent
which will be able to cache in memcache, shared memory (why run memcache
when you don't need to for small instances), Redis or DB_File (Ian
mentioned files). We could use Memoize::Expire for that purpose to keep
in-memory cache under control.

My goal it to have pluggable caching back-ends without changing Koha
code (or via system preference!)

Q: for this I will try to submit bug perl memoize function, together
with cache invalidation hooks needed to make it work under plack. OK?

I would really love to move to Redis (with which I have very good
experience since I'm original author of perl bindings ;-) mostly because
it has per-key cache invalidation which memcache lacks (and this
invalidates whole cache at once). Redis ability to lookup cache keys
using globs (e.g. *item*12342*) might help a *lot* in cache
invalidation.

memoize_persistant definitions should be accompanied with calls to
invalidation. I do agree that inserting new value into cache is better
solution so I will have that in mind.

3) caching of full database tables

This point is subtly different than memoize: sometimes we benefit from
caching whole data structure from database (frameworks, itemtypes,
languages, systempreferences) but in a way that allows us to retrieve single
items.

Another example are functions which would be memoizable but use some
kind of parameter ($opac, $user, $branch, $selected or so) which would
require to memoize whole structure again and again.

This is now we are using our $cache variables now.
I propose to move all of them to Koha::Persistent so we can do
full_size on that class to get memory usage and enable correct
invalidation in single place (per-request) or have proper invalidation
functions.

This is also a reason why sql_cache function tries to create proper
multi-level hash with cached values instead of just memoizing
$sth->fetchrow_hashref.

With proper invalidation those values might be stored in shared memory
so that all plack threads have access to them.

4) performance patches

Non-caching related but still important, like missing indexes or
my favorite example of this (so far):

Bug 7846 - get_batch_summary reimplements GROUP BY in perl code
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7846

(which needs SO :-)

5) testing and statistics

I do intend to run this code in production. To be honest, plack didn't
bright all performance improvements I was hoping for, and I think that
we can fix it in 3.10 release cycle (search page in my favorite one to
test with).

To achieve that, I will collect statistical data about cache usage
(hit/miss). I found it very valuable for developing so far and it's always
nice to have some idea how cache is performing.

I must say that developing under plack is a joy: fast page load time is
nice and you get used to it quite quickly, so slow parts of code pop up
even without DBIProfile or NYTProf :-)

-- 
Dobrica Pavlinusic               2share!2flame            [email protected]
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin
_______________________________________________
Koha-devel mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Re: [Koha-devel] RFC: Koha::Persistent - for plack and general performance

Reply via email to