Hi Frank, I have some unpublished code that served as a proof of concept while initially proposing HighLevelStorage. Let me see if I can put it up on github so you can take a look. It an experiment in using HBase, and in investigating what would be required for Fedora to be used in a distributed fashion (that is to say, any number of fedora instances operating independently against the same HBase repository). The key was in exploiting HBase's atomic checkAndPut() operation, and creating a new DOManager instance that decouples some functionality, and uses a HighLevelStorage interface.
Some considerations for other components: PidGenerator: This would need a new implementation to avoid shared state between Fedora instances where possible. The approach I took in the proof of concept is simply a random UUID-based pid generator. Using Zookeeper or HBase itself to coordinate between PidGenerator impls could work too - it looks like you are investigating that approach. FieldSearch: In a distributed environment, it probably does not make sense for each instance to have its own independent field search index. In the proof of concept, I left this empty (i.e. a field search impl that never indexes or returns results). I don't think it would be easy to have HBase implement field search, but have never tried. I was thinking that it would be necessary to erect field search as a standalone service that updates itself asynchronously in response to messages from various running fedora instances. CModel/SDef/SDep management: Currently, the database is used to persist 'model deployment map' - i.e. which services are bound to particular deployments for given content models. I think the table is actually named 'ModelDeploymentMap'. In my proof of concept, there is an interface called DeploymentManager which fulfills this role. This mapping can possibly change if someone modifies a CModel, or adds a new SDep or SDef to the repository - so it is the responsibility of the manager to provide accurate updates and lookups to each instance. I do not think I got to the point of providing a concrete implementation, though. DOManager: This was heavily modified! I created a new class called DistributedDOmanager, which uses an instance of HighevelStorage and would be theoretically appropriate for using in a distributed fashion. That is to say, each Fedora instance using a properly configured DistributedDOManager instance should be able to operate independently of one another. ResourceIndex: I ignored this in the proof of concept, but in a distributed environment, it would likely need to be deployed as an external service, updating itself through messages or RPC. Rebuilder: In a distributed environment where none of the local fedora instances retain any state, the meaning of Rebuilder becomes blurry. There is no need for a doRegistry table any more, and the various indexes/services (fieldSearch, RI, deployment manager) are external, so I would say that a Rebuilder for a fedora instance could become irrelevant. Rebuilding would be an operation for the individual shared services. -Aaron On Thu, 2011-06-16 at 09:46 -0400, Asseg, Frank wrote: > Hola Guys! > > We are currently trying to combine fedora with HBase/HDFS as a > Data/Metadata store and after implementing a low level storage proof of > concept for HDFS (https://github.com/smeg4brains/akubra-hdfs) the next > step would be to have fedora write the metadata information that is > currently written to a relational database to a HBase BigTable. > > Thats why i have those questions: > > 1.) From what i've seen in the fedora code, having fedora use HBase > instead of a relational DB, would encompass implementations for: > - org.fcrepo.server.management.PIDGenerator > - org.fcrepo.server.storage.DOManagar > - org.fcrepo.server.storage.lowlevel.PathRegistry > - org.fcrepo.server.utilities.rebuild.Rebuilder > Is this correct or am i missing some classes/interfaces here? > > 2.) On the wiki there is page about the HighLevelStorage > (https://wiki.duraspace.org/display/FCREPO/High+Level+Storage) which > sounds a lot like the thing we would need for implementing some > functionality we might need later on, like deciding where to store an > object based on it's size. Can you elaborate a bit on the state of this > feature? Are there already some concrete plans or even code? > > regards, > > frank > > ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers