Hi

10 days and no replies. That's not nice of people. So here I go.

I think I can follow the design you propose, even though I am not really
into the database code part of Fedora. 
To retell it, so you can check my understanding: There is some config in
DefaultDOManager.dbspec that determines which part of a fedora object is
cached in the database. You amend that config, so that the user can
provide a config file, so that additional content is cached.
That's all there is, right?

I am not against the idea, but I consider it a stopgap measure. 
The problem you outline is that actually querying the foxml files is to
slow in the fedora design. You want a faster way to access the contents,
and thus you propose to store it in a database. So far I agree, the
fedora backend is not fast for small queries (as the entire object is
parsed for any query), and some indexed frontend is sometimes required. 
Now, I do not know the performance of the various open source xml
databases, but it sounds radically simpler to store/backup the foxml
objects in an xml database, than writing complex expressions for mapping
selected parts to a relational database.

Having such an database, which could either be a cache of the foxml
files, or the primary store for the foxml files would allow fast queries
about properties on the objects or datastreams. This should probably be
the design we work towards, but your idea could easily serve as a
current way of doing database integration while we have no xml database.

Regards


On Fri, 2009-10-16 at 20:32 +0200, Lodewijk Bogaards wrote:
> Hi,
> 
> For speed reasons we wanted a database that contains the same information
> Fedora contains. I have emailed before (subject: gDatabase) that I figured
> that Fedora already has a feature to do so, for the dublin core and some
> other digital object properties, and that with some work Fedora can be made
> to keep the database synchronized for its user-made XML data as well.
> Currently I have this working within Fedora.
> 
> I am sending you the source which was made on top of the Fedora 3.2.1 source
> release, an example foxml and database schema.
> 
> The idea is that DefaultDOManager.dbspec is extended with this line:
> 
>     <include href="server/config/custom-db.xml" />
> 
> Then in that file under the Fedora home dir you can put your own database
> schema, which is an extension of the database schema used in the dbspec
> file.
> 
> Columns get their data by value getters. Currently I have implemented one
> value getter that uses an xPath query to get a value. This value getting
> code does not necessarily run for all digital objects. It is possible to
> choose a content model and/or datastream id that must be present for the
> tables to be updated by the digital object. Here is an example of table with
> a column:
> 
> <table name="easyFiles" contentModel="info:fedora/fedora-system:easyfile"
> datastreamId="file">
> 
> <column name="filename" type="varchar(256)" notNull="true" index="filename"
> default="-">
>   <value delimiterType="row" delimiter=",">
>     <valuegetter type="xPath" xPath="//easyfile:filename"
> nsPrefix="easyfile" nsUri="http://easy.dans.knaw.nl/files";
> delimiterType="normal" delimiter="," />
>   </value>
> </column>
>   
> An xPath query may return several values. For that two kinds of delimiters
> may be used. A row delimiter (meaning several rows are created for each
> value) and a normal delimiter (meaning a string value is inserted after
> every row). Also a values tag may contain several valuegetter tags, which
> can be delimited in the same two ways.
> If two columns return two rows those two rows are added together as one row.
> Also a defaultvalue for a second valuegetter may be used. Thus creating the
> possibility of composing rows almost any way one wants based on Fedora data.
> 
> A pid must always be present, but does not need to be the primary key
> (primaryKey attribute of the table). It is thus up to the user how the data
> is composed into tables, and if the user makes a mistake an SQLException is
> thrown and the digital object is thus not ingested/updated, thus forming
> another kind of safety net that does not necessarily work so well if the
> database would be filled from within the users application.
> 
> With this simple system it is possible to do almost any kind of database
> synchronization based on Fedora data. I have seen many projects based on
> Fedora that employ a database alongside Fedora in order to speed up the
> querying process. I therefore think this might be useful for many.
> 
> Of course the search interface that comes with Fedora may also be extended
> to make use of this new feature, but since that is not a need for our
> project at the moment I have not taken the time to do so.
> 
> I would be very pleased if this could become part of subsequent Fedora
> releases. Hopefully others think so too.
> 
> Kind regards,
> 
> Lodewijk Bogaards 
>  
> 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to