> > Understood. The single Fedora object add/update/delete being the > fundamental atomic unit of work seems like a reasonable place to > start, and is an assumption I also made with fcrepo-store. Going after > a smaller level of granularity (e.g. the datastream) would be a > different challenge which, if folks want to go down that road as a > thought exercise, modeling
Yes, that would be a very fundamental change. I'm not sure it even makes sense without changes to the object model. > However, I still don't see exactly how the alternative could be done. > That is, making all content available somehow within the passed-in > FedoraObject instance without resorting to passing around actual > managed content streams all the time. *references* to streams need to be passed on. These can/will/must be dereferenced into a stream at the appropriate time using the appropriate resolver. For the case of "get the stream of an existing stored datastream", the dereferencer impl would get the stream from some storage provider somehow. For the case of "get the stream of this newly deposited/uploaded stream to put into storage", the dereferencer impl will get the stream from some sort of cache or buffer. That's very similar to how it works today. > Maybe if the DatastreamVersion class had a getManagedContent() method > that returned an inputstream if the control group was "M". And in the > case where the FedoraObject was being updated, that could be null, > indicating that the intent is to keep the value the same? Not necessary to do tricks with null - the object metadata is enough to express intent. > In general, I like the idea of the FedoraObject instances being "dumb" > value objects that don't do any sort of computation or validation, but > just provide getters/setters. But things seem to get harder to reason > about when the objects need to encapsulate arbitrarily large content > streams. Do you have any ideas for how things might look at this > level? At its core, FedoraObject instances will just be *metadata*. Any stream accessor methods within the object would there for "convenience" (that's not the right word). Datastream content fetching would ultimately depend on some implementing class resolving some sort of content identifier present in the datastream metadata. Packaging the correct resolver impls within the FedoraObject would be one way to make this convenient, but is not the only way to do it. > Which brings me to a related, but less important question: If managed > content can be passed into storage by value in this way, why should > the serialized FOXML (or whatever) actually hold any kind of reference > to it? The pid+dsId+dsVersionId thing isn't actually useful once the > object has been stored. I don't think I understand why some sort of resolvable content identifier in the FOXML wouldn't be necessary. > I'm personally able to reason > better about this stuff after looking at/writing actual code. Take a look at the proof of concept branch hlstore_hbase_poc in my fork (birkland/fcrepo on github). In particular, look at org.fcrepo.server.storage.distributed.DistributedObjectSource and org.fcrepo.server.storage.distributed.DistributedDOManager Trace the code to see how datastreams are handled. It's shoehorned into the "existing way" to retain compatibility with most of the rest of the Fedora code. -Aaron ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers