Hi Aaron,

I read your writeup again last night and it got me thinking about a
couple topics in detail.

First, I like the idea of concentrating on what amounts to
whole-object CRUD operations. I think where you started was with the
notion that a running repository has a primary instance of this
interface, through which all reads and writes of DigitalObject
instances must pass.  As Asger pointed out, the "write-oriented" side
of the interface (or something close to it) could be leveraged to
provide a common way to send object updates to various independent
modules (alternate indexes, etc) within a running repository.  The
actual work of sending object updates to various registered modules
could be handled by a given HighLevelStorage impl, but at least one of
the "sinks" also needs to be capable of providing a DigitalObject
getter.

Looking at the interface as you currently have defined on page 5:

> interface HighLevelStorage {
>  void add(DigitalObject object)

This is straightforward.  I'm immediately thinking of memory
constraints, though.  In particular, since managed content needs to
come through this door, it needs to be accessible as a stream.  Not
particular to this method, but since we'd want this interface to be
pretty stable long-term, re-evaluating the DigitalObject interface's
suitability for this purpose would be good.

>  void update(DigitalObject oldVers, DigitalObject newVers)

Yes, providing both versions to this method is important for the
reasons you describe.

>  DigitalObject read(String pid)

How about org.fcrepo.common.PID instead of String?

>  void remove(String pid)
> }

Initially, I thought providing the PID was enough here, but for the
same concurrency control reasons behind read(..), why not:

void remove(DigitalObject oldVers)

This would provide an opportunity to veto a remove() if the object had
recently been changed by another thread.  Related: should
veto-if-changed be part of the published contract of this interface?

RE: Multiplexing

I basically agree with what you say in section 3.1: that
HighLevelStorage would be the right place to provide per-blob "hints"
for underlying Akubra impls to make multiplexing decisions.  This
makes sense because Akubra is (by design) ignorant of Fedora
semantics, so whatever is sending content in must be able to do
whatever translation is needed.

Where it gets interesting, I think, is when you consider the kinds of
hints that should be passed from one level of storage abstraction to
the next.  For example,
do you simply pass in the id of the target Akubra store as a "hint",
as you did in your writeup, or do you pass in things like mime type,
size, etc, and delegate to whatever rule processing is in the
underlying storage system to make the final decision about which store
it actually goes to.

My intuition is that a combination may be best: If the multiplexing
decision can be made on basic blob metadata like size, mimetype, etc,
it's best to delegate it rather than making the decision in a layer
that *must* be Fedora-aware.  If it can't, it's best to either
translate to hints that can be used at the next level, or failing
that, provide the target store id based on some rules that were
evaluated at the higher level.

RE: Non-blob storage

As you point out, this interface provides a clear plug-in point for
non-blob-oriented persistence strategies to be implemented.  One of
the big strengths of Fedora has been this ability to "crawl the files"
and reconstitute indexes in disaster recovery scenarios, and this
obviously doesn't preclude doing that as an option.  But it reminds me
that having (at a minimum) the ability to iterate what's been stored
is important to providing any kind of re-constitution of secondary
indexes.  That may apply here.  In other words, should the
HighLevelStorage interface be iterable<DigitalObject>?

- Chris

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to