Re: [fcrepo-dev] More food for 4.0 thought: fcrepo-store

Chris Wilper Thu, 22 Mar 2012 08:42:33 -0700

Thanks for your thoughts, Frank. Comments inline below.

On Tue, Mar 20, 2012 at 4:01 AM, frank <frank.as...@congrace.de> wrote:
> Hola Guys,
>
> here are some comments about the Storage ideas from my point of view:
>> So here's a provocative question to start: Assuming for a moment that
>> the core Fedora object model (versioning warts and all) stays the same
>> for 4.0, would something like this interface actually be compatible
>> with the major objectives we've talked about with respect to High
>> Level Storage?
> I think the interface is fine, and just like Akubra it should be
> straightforward to implement on e.g. HDFS/HBase. But shouldn't there be
> a locking mechanism on the Digital Object level? So that if this were to
> be used in a distributed environment, that fedora instance a can lock
> object X while e.g. it's updating the object's datastreams.


I started to look at how locking might be done, assuming that we'd
want it supported for a cluster of Fedora webapps, and quickly came
across Hazelcast as a prevailing, robust solution for sharing such
state across multiple nodes. The interesting thing about HazelCast is
that it's API-compatible with a lot of the collections in java.util
and the locks in java.util.concurrent.

In particular, for locks, it seems that the obtaining of the lock, if
you're using something that's API-compatible with java.util.concurrent
locks, doesn't really *need* to be coupled with the API that provides
read/write access to the resource you're insterested in locking.

Forgetting transactions for a second, I was kind of wondering if some
sort of LockProvider service would be useful (one implementation of
which would be a hazelcast/cluster-capable one). Higher level code
that works with fcrepo-store would do something like:

Lock lock = lockProvider.getLock(pid);
lock.lock();
try {
  store.addObject(fedoraObject);
} finally {
  lock.unlock();
}

> It would be nice if the setContent() method could supply the size of the
> datastreams, so that implmentations could choose a storage layer based
> on the size of the stream, to mitigate e.g. Hadoop's "Small files
> problem", so you can choose to write small files in an Sequence file or
> even a HBase table. Maybe some kind of Hints as they are present in the
> Akubra API would make sense, so that arbitrary indformation can be
> passed down into the storage layer.

I agree that size and other info (mime type, etc.) are important for
implementations to have access to at storage time. Interestingly, the
way the API currently works, the associated FedoraObject *must* be
provided to the impl prior to the call to setContent(). So most such
info could be obtained from that. Are there other hints, not
necessarily present in the FedoraObject, that we can envision being
important to making content storage decisions?

Note that I'm not convinced that having stream-oriented (for managed
content) and object-oriented (for FedoraObjects) methods at the same
level in the API is the right move necessarily -- it just seemed more
practical to implement in the short term because embedding
stream-getting/setting functionality directly inside the FedoraObject
interface would tie instances to a particular FedoraStore impl...which
makes them harder to move around, if that makes sense.

>> c) Transactions.
>> Unsure. But I think it's worth stepping back and considering the
>> cost/benefit of implementing true ACID transactions across Fedora's
>> API for Fedora 4. I know the discussion of HLStorage has touched on
>> the possibility of doing this in the past, but it's been very short on
>> detail.
> IMHO transactions would be a feature that the users would like to see
> very much. Transactions seem to be a feature that invokes a feeling of
> trust in users.
> And i recently played a bit with implementing a custom
> PlatformTransactionManager from Spring which gives you the possibility
> to use those beatiful @Transactional annotations, instead of handling
> each transaction programmatically. It's quite easy to implement,
> although there still is the hard part of rolling back unsucessful
> transactions.

In your experience, were you working with already-transaction
resources (via JTA?) As mentioned on the call, I think if we attempt
to implement transactions ourselves, there's all kinds of opportunity
for failure. But if we can "wrap" already-transactional resources
while still keeping the ability to integrate non-transactional blob
storage, that seems more palatable to me.

>> e) Lock-free concurrent updates
>> No. I think some way of declaring the previously seen state would be
>> necessary to achieve this. But again, I'm not sure that
>> whole-Fedora-object-locking at a higher level is such a bad thing if
>> it's done correctly and doesn't make the single-node-Fedora assumption
>> that the locking in DOManager does today.
> Yes as you might have guessed from the previous paragraph i think object
> locking would be hughly benefitial in the context of asynchronous writes
> or a federation of fedora.

The original point made in the paper was that there was a way to not
*force* locking to occur (via optimistic concurrency control) if the
storage interface provided a way to declare the previously-seen state
with each request. There is certainly overhead in obtaining
cluster-wide locks...but I'm not sure that would actually become a
significant bottleneck for quite a while for typical uses of Fedora.

- Chris

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [fcrepo-dev] More food for 4.0 thought: fcrepo-store

Reply via email to