Hi Eric,

Very interesting read. Thanks for that. From our experience here:

Our dataset consists of roughly 650,000 fedora objects and 30 million 
relationships. We use Fedora 3.2 with Mulgara as triple store and Solr as 
search engine. What we noticed:


*         Ingest into Fedora and Mulgara is ok-ish and scales very well. It 
takes us about 1-2 seconds to ingest an object (Adding the object, about 4 very 
small data-streams and a couple of relationships all through fedora-client.jar) 
and this time remains more or less constant. While not even close to a DB 
system in performance it's probably fast enough for our purposes and it scales 
well. We found that different SOAP clients (generated by different WSDL to java 
tools) vary a lot in performance and recommend sticking to fedora-client.jar 
for Java.

*         Using the message queue with fedoragsearch to update SOLR seems not 
to be a good idea for larger datasets. I suspect using the message queue for 
anything with a large datasets is not a good idea. It seems to consume a lot of 
resources when running, does not seem to shut down reliably (resulting in very 
long rebuild time next time you start tomcat) and seems to scale very badly 
(performance goes down significantly the more objects we add). There are a lot 
of "seems" in this statement because we didn't actually measure this and it is 
only based on our subjective observation.

*         Building a SOLR index from scratch through the fedoragsearch web 
interface is quite fast (much much faster than ingesting objects into fedora, 
about half a day for the above dataset).

*         Query times in Fedora are quite good (basically instant).

*         Query times in Solr are quite good (basically instant).

*         Query times in Mulgara are very bad. Using the fedora built in 
risearch web interface a count of all triples takes more than 20 minutes. Even 
simple queries like querying all subjects for a given predicate and object in 
SPARQL often take more than 10 seconds. There seems to be quite a bit of 
variation in query times whose reasons are not quite clear to us but it seems 
that the further into the dataset the results are the longer it takes; 
indicating that Mulgara might not use any (or very bad) indexes on the data. 
Other Triple stores, especially Jena SDB (+MySQL) and TDB seem to be 
significantly faster. There might be ways of optimizing Mulgara that we are not 
aware of.

Carsten Friedrich
Research Team Leader
CSIRO ICT Centre
M: +61 (0) 2 6216 7019
www.ict.csiro.au<http://www.ict.csiro.au/>

From: Eric Melz [mailto:em...@alelo.com]
Sent: Monday, 26 October 2009 13:30
To: fedora-commons-users@lists.sourceforge.net
Subject: [Fedora-commons-users] Fedora Performance

Hi-

I've done some investigation on the Fedora performance using a variety of 
configurations and access patterns.  I've posted a report at 
http://technotes.emelz.com/fedora-performance.  I'd be interested to receive 
feedback, particularly if people have additional insights on this topic.

Cheers,

eric

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.423 / Virus Database: 270.14.32/2459 - Release Date: 10/25/09 
19:57:00
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to