Hi Frank!

Without a fairly significant rewrite of key bits of Fedora, attempting
to implement the persistence on HBase+HDFS currently would be pretty
difficult. Supporting that kind of big change is what the High Level
Storage effort is all about. So far, in our development discussions,
we have been talking about High Level Storage as a 4.0 thing, which is
probably at least a year away.

To be honest, the High Level Storage effort has moved much more slowly
than most of us would have liked (nobody's fault -- just most of us
have higher priorities we're busy working on), and I think we all
agree that some real prototyping and experimentation is needed at this
point to move the work forward. So I think it's great that you're
digging in and experimenting with HBase+HDFS...I hope some of your
findings can help to inform the High Level Storage effort down the
road (whatever that becomes)

> 1.) From what i've seen in the fedora code, having fedora use HBase
> instead of a relational DB, would encompass implementations for:
>  - org.fcrepo.server.management.PIDGenerator
>  - org.fcrepo.server.storage.DOManagar
>  - org.fcrepo.server.storage.lowlevel.PathRegistry
>  - org.fcrepo.server.utilities.rebuild.Rebuilder
> Is this correct or am i missing some classes/interfaces here?

I don't think a PathRegistry is really necessary, as that's an
implementation detail of the legacy llstore implementation. If you're
using an akubra-based llstore plugin, I don't think that class should
be in use at all.

A couple missing classes that come to mind here are the ResourceIndex
and FieldSearch modules. By design, these are not critical to the
operation of Fedora as a service...in fact risearch is explicitly
optional. However, parts of the REST API as currently defined won't
work if you don't have a FieldSearch replacement in place.  In
particular, /fedora/objects?(search criteria)  I think, longer-term,
both of these components really belong outside the core repository
service. So if I take a long view of what you're doing I see
absolutely no problem with ignoring them for now.

--

As a related issue, I know a lot of folks have been thinking lately
about what it would take to make Fedora horizontally scale. There are
many possible approaches that could be taken; some more traditional
Java clustering approaches that allow for shared state (e.g.
Terracotta), and more "web-scale" approaches that involve minimal
shared state and minimal points of failure. HBase+HDFS falling into
the latter category.

I have actually been wondering about Apache Cassandra lately as a
possible solution. As you may know, Cassandra is really not designed
for dealing with very large files, but it is truly a "shared nothing"
persistence solution that does not have a single point of failure.
HBase also looked promising to me, but I noticed that it does have at
least one SPOF by design (the NameNode). Cassandra also appeared to
have a slightly larger community around it at the moment. Did you also
consider Cassandra in your effort? I'm curious what your evaluation
criteria were if you did.

Thanks,
Chris

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to