On Thu, Mar 29, 2012 at 3:47 PM, aj...@virginia.edu <aj...@virginia.edu> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Mar 29, 2012, at 3:10 PM, Chris Wilper wrote: > >> Going after a smaller level of granularity (e.g. the datastream) would be a >> different challenge which, if folks want to go down that road as a thought >> exercise, I'm happy, but in my mind there's already a pretty easy answer >> for anyone who has use cases that really provoke different-datastream- >> same-object-update concurrency: Use atomistic content modeling > > Hm. I'm not sure whether I agree or disagree with you, Chris. {grin} > > Couldn't this be construed as forcing people to conflate two sets of > values: accuracy in domain modeling and policies around the curation of > content? > If my identities really are well-chosen to model my domain, having to > break them apart to satisfy operational concerns feels wrong.
I'm not sure I see how a domain model's accuracy could be degraded by increasing the atomicity of the persistence entities. Stepping back for a second, I think it's interesting that the line between datastream and object has become less distinct in recent years. I'm not sure that's such a bad thing, and it makes me question the value of continuing to treat datastreams as second-class citizens in the architecture. Are there good pure domain modeling reasons? Information hiding is all I can think of, but that seems like a concern you'd want to layer on top of whatever persistence mechanism you had at your disposal. In other words, you're welcome to think of a certain set of your Fedora objects as private members of some other set of Fedora objects. > There > might also be scaling concerns: in an example of objects that contains > both metadata and data, which might be subject to different constraints > of transactionality, breaking one or the other out into a separate object > could instantly double the number of objects in the repo. Ahh, but in the brave new pluggable world, there is no requirement that you allocate a filesystem inode (for example) for every Fedora object. What if you opted to store each Fedora object in a highly scalable relational database instead? > I do see your point, and I accept that Fedora can't offer all things to all > people, > but I wonder if we can find a way to leave the door open for > "other-than-object" > atomicity, perhaps without building it out immediately? I'd be interested to get others' thoughts on this too, but it seems like if we wanted to accomodate "more-granular-than-object-level" atomicity in the design, it would look quite different from what's been discussed so far with HLStorage/fcrepo-store. And so far in the discussion, keeping it at the Fedora object level, we've been able to punt on/ignore some of the details with the FedoraObject design. It'd be good to talk about those in some detail as well, regardless of where we end up on this thread, so I welcome the discussion. >> FedoraStoreSession session = fedoraStore.getSession(); >> <snipped> >> session.close(); >> } >> >> Now, unless someone has gone a bit wild with datastreams, >> FedoraObject.copy(), a "deep" copy, is going to be fairly cheap on its >> own. But what do we actually do with managed datastream content? > I'm not sure I understand the question... wouldn't it be a pointer and nothing > more? If you provide access to it through the FedoraObject object, we can > still treat an URI as a value. And if someone creates a new URI (e.g. by > offering new content in a modifyByValue) then you can change the URI to > the URI of the new content (value-for-value). But perhaps I'm > misunderstanding the question... especially because I'm not sure I > understand what "deep" copy means-- a copy wherein all datastream > content is also duplicated? To clarify, by "deep" copy, I mean that it's a full copy of whatever members the original object had, so that changing a value or field somewhere in one does not affect the other. The important bit is just that oldObj does not change. Currently in the dto design, FedoraObject instances do not provide access to binary data, even if managed. They just point to it via the DatastreamVersion.contentLocation (URL) accessor. If that approach were used here (just a pointer and nothing more, as you suggest), let's think about what would happen within the fcrepo-store/HLStore impl in response to the following requests: .add(FedoraObject obj): Any managed datastream referenced in obj would need to be resolved by the store impl, then stored. Key Question: Is the original reference (dsLocation) retained? Today in Fedora it's in fact not retained; it's changed. Options are: Keep it as-is, change it to pid+dsId+dsVersionId (as is done today), or remove it. Personally I am beginning to think removing it might be the right move, but that fights with my instinct to store the FedoraObject instance "exactly as given" .update(FedoraObject oldObj, FedoraObject newObj): For each managed datastream referenced in oldObj: - if it's not in newObj, delete it from storage - if it's in newObj but with a different location, replace the old content - if it's in newObj but with the same location, maybe we still need to replace the old content. How do we know if that's necessary? .get(pid) In this case, if a only a reference (URL) to the content is provided, the caller (client of the store) needs to resolve it to get the content. A whole 'nother possibility is that FedoraObject instances don't pass managed datastreams in by reference (URL) at all. Instead they're passed in by value (via a getManagedContent()) method. In that design, the store impl doesn't need to be responsible for resolving anything on an add request...it just streams the given content to storage. - Chris > - --- > A. Soroka > Software & Systems Engineering :: Online Library Environment > the University of Virginia Library > >> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJPdLxKAAoJEATpPYSyaoIkkmMH/1oqHFzljZT9/Tq17XhyKdHv > wlyZOx0uMfRx+JepTI2xh7CTHigtxTemKLtIuc3EK/XtU1M+0Tb34vez2kfjOO6C > TM50BtU/7dT2MQmg6zZdhCCh15i7pifL97DrxzrzHYbuv1jKvV4bsOGBDJsM67iD > ZAjZSvOZlJZ8ob18fvGuMttfZ29K74gz0wHeEMuyTG0s5WPfiy/q/Ft2X3+Hc/CB > LE1o3tM0yPyi7mmEhMGYMnkfXjKQckVCYJ0DkJwRU0JeVog/UlM1Orl0f2gxpDPW > xmWpSHfFMnDDM6cY0Jlns+err/PI4bJ5qU5NmM6keDgMII1LwFgXrpOyrVUYzjA= > =0tNK > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers