> 
> Understood. The single Fedora object add/update/delete being the
> fundamental atomic unit of work seems like a reasonable place to
> start, and is an assumption I also made with fcrepo-store. Going after
> a smaller level of granularity (e.g. the datastream) would be a
> different challenge which, if folks want to go down that road as a
> thought exercise, modeling

Yes, that would be a very fundamental change.  I'm not sure it even
makes sense without changes to the object model.

> However, I still don't see exactly how the alternative could be done.
> That is, making all content available somehow within the passed-in
> FedoraObject instance without resorting to passing around actual
> managed content streams all the time.

*references* to streams need to be passed on.   These can/will/must be
dereferenced into a stream at the appropriate time using the appropriate
resolver.  For the case of "get the stream of an existing stored
datastream", the dereferencer impl would get the stream from some
storage provider somehow.  For the case of "get the stream of this newly
deposited/uploaded stream to put into storage", the dereferencer impl
will get the stream from some sort of cache or buffer.  That's very
similar to how it works today.


> Maybe if the DatastreamVersion class had a getManagedContent() method
> that returned an inputstream if the control group was "M". And in the
> case where the FedoraObject was being updated, that could be null,
> indicating that the intent is to keep the value the same?

Not necessary to do tricks with null - the object metadata is enough to
express intent.

> In general, I like the idea of the FedoraObject instances being "dumb"
> value objects that don't do any sort of computation or validation, but
> just provide getters/setters. But things seem to get harder to reason
> about when the objects need to encapsulate arbitrarily large content
> streams. Do you have any ideas for how things might look at this
> level?

At its core, FedoraObject instances will just be *metadata*.  Any stream
accessor methods within the object would there for "convenience" (that's
not the right word).  Datastream content fetching would ultimately
depend on some implementing class resolving some sort of content
identifier present in the datastream metadata.  Packaging the correct
resolver impls within the FedoraObject would be one way to make this
convenient, but is not the only way to do it.


> Which brings me to a related, but less important question: If managed
> content can be passed into storage by value in this way, why should
> the serialized FOXML (or whatever) actually hold any kind of reference
> to it?  The pid+dsId+dsVersionId thing isn't actually useful once the
> object has been stored.

I don't think I understand why some sort of resolvable content
identifier in the FOXML wouldn't be necessary.

> I'm personally able to reason
> better about this stuff after looking at/writing actual code.

Take a look at the proof of concept branch hlstore_hbase_poc in my fork
(birkland/fcrepo on github).  In particular, look at
org.fcrepo.server.storage.distributed.DistributedObjectSource and
org.fcrepo.server.storage.distributed.DistributedDOManager

Trace the code to see how datastreams are handled.  It's shoehorned into
the "existing way" to retain compatibility with most of the rest of the
Fedora code.

   -Aaron


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to