Hi folks, I know that the official position is that cTAKES is not thread-safe. I'm wondering, however, if anyone has looked into using multiple processing pipelines (via the processingUnitThreadCount directive in a CPE descriptor and documenting where the thread safety problems lie.
I've given it a bit of a try, and on first glance the biggest issue seems to be in the LVG api, which isn't at all thread-safe (they seem to claim that it would be thread-safe so long as API instances are not shared, but that doesn't seem prima facie true since it throws errors when multiple pipelines are used, which *should* be creating multiple LVG api instances). I haven't found any other serious issues, but perhaps you folks might be familiar with some. There is, of course, the memory issue -- cTAKES' memory footprint alone on my machine with a single pipeline and using a mysql umls database is over 2GB; this is presumably the cost of each pipeline, though I can't actually really figure out what all that memory is being used for since none of the in-memory DBs and indexes used seem to be anywhere near that size. It is, of course, possible to split datasets and simply run multiple processes, but my feeling is that there must be a lot of unnecessary overhead there since all the operations we actually do (other than the CAS consumers) are read-only. It seems to me that cTAKES ought to be limited only by disk/memory throughput and total CPU capacity because of the nature of the load... Anyway, if anyone else has thoughts, I'd be interested. This is something I'd be interested in taking a stab at resolving, since I've been poking around in this direction behind the scenes for some time now. My group has access to huge databases but limited computational resources, and I'd like to make the most of what we've got! Karthik -- Karthik Sarma UCLA Medical Scientist Training Program Class of 20?? Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation to the House of Delegates of the American Medical Association [email protected] gchat: [email protected] linkedin: www.linkedin.com/in/ksarma
