Hi Frank,

I put my initial HighlevelStorage proof of concept up on github, in my
forked fcrepo repo:
https://github.com/birkland/fcrepo.git
branch: hlstore_hbase_poc

(you can see it as part of the fcrepo network view:
https://github.com/fcrepo/fcrepo/network)

Everything of interest is in the fcrepo-hlstore module, including
- HighlevelStorage interface, and an HBaseHighlevelStorage
implementation
- DistributedDOManager - alternate "drop-in" DOManager implementation
that uses HighlevelStorage
- Example Spring config files in src/main/resources/config/spring

This is a proof of concept created in early 2010, originally against
Fedora 3.3.  It is now updated for fedora 3.5 snapshot.  It was created
prior to several design/discussion meetings, so it does not reflect the
entirety of current thinking (such as: splitting the HighlevelStorage
interface into 'Readable' and "Writable').  Nevertheless, it may be a
good starting point to work from.  There are some ugly hacks and
workarounds in order to require minimal changes to existing code.

To try it out:
- checkout the hlstore_hbase_poc branch, and build it
- run the resulting fedora installer as usual
- edit fedora.fcfg, and remove the following modules:
org.fcrepo.oai.OAIProvider, org.fcrepo.server.management.PIDGenerator,
org.fcrepo.server.storage.DOManager,
org.fcrepo.server.search.FieldSearch
- remove akubra-llstore.xml from server/config/spring
- copy the contents of
fcrepo-hlstore/src/main/resources/config/spring/highlevel_hbase/ into
server/config/spring
- Edit server/config/spring/HighLevelStorage_hbase.xml and either (a)
modify the value of HBaseRoot parameter to point to your standalone
HBase table location, or (b) if you are using HBase in distributed mode,
remobe HBaseRoot parameter, and instead put the hostname and port in a
property called 'HBaseMaster'
- Start tomcat.

It will create an HBase table called 'fedora' if one does not exist.  It
has three column families: 
- 'object' containing the serialized fedora object
- 'meta' containing the last modified datestamp
- 'datastream', containing all managed datastream.  Each managed
datastream creates a column named after their datastream ID

I think HBase defaults to three versions for each row.  Cell versions
are used for implementing datastream versioning, so the table will need
to be tweaked if you wanted to try more.

Note: as this was just a proof of concept, there is much that is not
implemented.  FieldSearch has been replaced with an empty stub always
returns zero results, as does DeploymentManager, so searches and
disseminations won't work until real implementations have been
developed.

Take a look, browse the code, read the javadocs (where present) - I'd be
happy to answer questions.

  -Aaron



------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to