Hi Frank,

I have some unpublished code that served as a proof of concept while
initially proposing HighLevelStorage.   Let me see if I can put it up on
github so you can take a look.  It an experiment in using HBase, and in
investigating what would be required for Fedora to be used in a
distributed fashion (that is to say, any number of fedora instances
operating independently against the same HBase repository).  The key was
in exploiting HBase's atomic checkAndPut() operation, and creating a new
DOManager instance that decouples some functionality, and uses a
HighLevelStorage interface.

Some considerations for other components:

PidGenerator:
This would need a new implementation to avoid shared state between
Fedora instances where possible.  The approach I took in the proof of
concept is simply a random UUID-based pid generator. Using Zookeeper or
HBase itself to coordinate between PidGenerator impls could work too -
it looks like you are investigating that approach.

FieldSearch:
In a distributed environment, it probably does not make sense for each
instance to have its own independent field search index.  In the proof
of concept, I left this empty (i.e. a field search impl that never
indexes or returns results).   I don't think it would be easy to have
HBase implement field search, but have never tried.  I was thinking that
it would be necessary to erect field search as a standalone service that
updates itself asynchronously in response to messages from various
running fedora instances.

CModel/SDef/SDep management:
Currently, the database is used to persist 'model deployment map' - i.e.
which services are bound to particular deployments for given content
models.  I think the table is actually named 'ModelDeploymentMap'.  In
my proof of concept, there is an interface called DeploymentManager
which fulfills this role.   This mapping can possibly change if someone
modifies a CModel, or adds a new SDep or SDef to the repository - so it
is the responsibility of the manager to provide accurate updates and
lookups to each instance.  I do not think I got to the point of
providing a concrete implementation, though.

DOManager:
This was heavily modified!  I created a new class called
DistributedDOmanager, which uses an instance of HighevelStorage and
would be theoretically appropriate for using in a distributed fashion.
That is to say, each Fedora instance using a properly configured
DistributedDOManager instance should be able to operate independently of
one another.

ResourceIndex:
I ignored this in the proof of concept, but in a distributed
environment, it would likely need to be deployed as an external service,
updating itself through messages or RPC.

Rebuilder:
In a distributed environment where none of the local fedora instances
retain any state, the meaning of Rebuilder becomes blurry.  There is no
need for a doRegistry table any more, and the various indexes/services
(fieldSearch, RI, deployment manager) are external, so I would say that
a Rebuilder for a fedora instance could become irrelevant.  Rebuilding
would be an operation for the individual shared services.

  -Aaron

On Thu, 2011-06-16 at 09:46 -0400, Asseg, Frank wrote:
> Hola Guys!
> 
> We are currently trying to combine fedora with HBase/HDFS as a 
> Data/Metadata store and after implementing a low level storage proof of 
> concept for HDFS (https://github.com/smeg4brains/akubra-hdfs) the next 
> step would be to have fedora write the metadata information that is 
> currently written to a relational database to a HBase BigTable.
> 
> Thats why i have those questions:
> 
> 1.) From what i've seen in the fedora code, having fedora use HBase 
> instead of a relational DB, would encompass implementations for:
>   - org.fcrepo.server.management.PIDGenerator
>   - org.fcrepo.server.storage.DOManagar
>   - org.fcrepo.server.storage.lowlevel.PathRegistry
>   - org.fcrepo.server.utilities.rebuild.Rebuilder
> Is this correct or am i missing some classes/interfaces here?
> 
> 2.) On the wiki there is page about the HighLevelStorage
> (https://wiki.duraspace.org/display/FCREPO/High+Level+Storage) which 
> sounds a lot like the thing we would need for implementing some 
> functionality we might need later on, like deciding where to store an 
> object based on it's size. Can you elaborate a bit on the state of this 
> feature? Are there already some concrete plans or even code?
> 
> regards,
> 
> frank
> 
> 



------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to