Hi Ari,

On Wed, Jun 15, 2011 at 1:37 PM, Ari Davidow <aridavi...@gmail.com> wrote:
> A couple of quick questions for Fedora admins using Amazon Web Services
> (AWS):
> 1. What are the real memory requirements of Fedora, excluding ingest? How do
> these change when Fedora is ingesting new materials? (Bonus question: If
> ingest is the memory hog, any way to launch a special "ingest" Fedora
> instance that would process new materials, place the ingested materials in
> appropriate location(s), pass on metadata, write log, and uninstantiate?
> That would let us take advantage of AWS for ingest without needing a massive
> server instance running Fedora the rest of the time

The most memory-intensive thing that Fedora does is reading and
writing FOXML. This is currently done on just about *every* read or
write request...though there is a very simple cache that obviates the
need for concurrent reads to the same object.  Since FOXML encodes
metadata about all datastream versions and actually includes the
content of inline XML datastreams, these are both areas you should be
aware of when thinking about memory use of the server. A couple simple
guidelines:

1) For datastreams that are frequently modified, consider switching
VERSIONABLE to false, or periodically purging old history.
2) Avoid using inline XML as much as possible. Historically, Fedora's
"special" datastreams (DC, RELS-EXT, etc) have been required to be
inline XML, but the latest version drops that requirement.

> 2. Apparently Fedora currently only works with Amazon's EBS for storing
> objects right now. This is an issue, since EBS is not considered "permanent"
> storage (Amazon notes a failure rate between 1/100 and 1/1000 and highly
> recommends storing a backup of the instance on S3) What to do?

I don't know if any real best practices have emerged on this yet, but
I'll offer a couple thoughts:

EBS is pretty nice from an operational standpoint (fast snapshots),
but as you've noted, it's not as suitable for long term storage as S3.
One approach that can mitigate risk quite a bit (and actually we
happen to do this for a big portion of the DuraSpace wiki/jira/etc
services right now) is to use EBS as your day-to-day store, do
periodic snapshots against it, and tar the whole snapshot up
periodically, keeping multiple copies of that in S3 and wherever else
you see fit.

We're currently also looking at other ways of (asynchronously) backing
up Fedora instances in the cloud (whether the instance primarily runs
in your local datacenter or also happens to be hosted in the cloud).
You might be interested to look at the Fedora CloudSync service here:
https://wiki.duraspace.org/display/FEDORACREATE/Fedora+CloudSync It is
designed to operate at the Fedora Object level rather than the blob or
block level.

> 3. For ingesting large amounts of data, one can use Amazon's import/export
> service. We would send Amazon a hard disk with our data and they upload the
> data to the S3 bucket of our choice. Right. S3, again. Anyone have any
> experience with this? Duracloud folks, what are y'all doing?
> Thanks,
> Ari

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to