The biggest concerns I have about number one are how unruly it could get very quickly given some sites have to be archived 5 days a week; Monday - Friday. That would be 261 versions of an object/year.
What I had worked out in my head, was a combination of 1 & 2. Where I create a collection object (Islandora parlance) and then and object for every crawl, which in turn would be a child of the collection object. I think if I go that direction, I would be able to quickly display the number of crawls for a given site, then provide some "serendipitous" browse options for a user - paging through screenshots, date lists, full-text searching, etc. I just really have to figure out how to get fcrepo and the wayback machine to talk to each other :-) Also, FWIW we had a fruitful discussion[1] on Twitter early if anybody is interest in the thread. It seemed to jump off list and back on. Thanks for jumping up Ben! -nruest [1] https://twitter.com/mjgiarlo/status/308788759246303233 On 13-03-04 11:36 PM, Benjamin Armintor wrote: > I can immediately think of two approaches I'd consider: > 1. Each Fedora object represents a website. There is a content DS that > is a WARC; it is versioned. There is a screenshot DS and a full-text DS- > these are unversioned and presumed to relate to the most recent WARC. > You could have a similar DS for descriptive metadata. This model hinges > on being able to assume the non-WARC datastreams only really have a > relationship to the object, not a particular crawl. > > 2. Every crawl is an object, meant never to be altered. The related DSs > might change between versions- you would have to directly compare the > analogous DSs in two objects. You would refer to the website across all > versions either in a metadatum (a RELS-EXT relationship, or something > consistent in the descMetadata like a MODS relatedItem) or as an > umbrella object across all the versions (obviously still marking up the > isVersionOf/isMemberOf relationships, but having a place to locate the > generic descriptions separate from the crawl descriptions). > > The latter is probably more "correct", but it will also be more > cumbersome to work with. If you wanted to actually extract resources > from a WARC (a particular file asset, for example), I think you really > have to follow a plan like the second option. > > - Ben > > > On Mon, Mar 4, 2013 at 10:56 PM, Nick Ruest <rue...@gmail.com > <mailto:rue...@gmail.com>> wrote: > > Hi folks, > > I began working on an Islandora Solution Pack for web archives a while > back, and the more I work on it and think about it I'm a little stuck on > an foundational aspect, what is the object? > > The way I had initially constructed it as a proof of concept was just > ingesting and disseminating warc files. But, as I learn more and more > about web archiving, there is more I'd like to do dissemination wise > with associated datastreams (screenshots, pdfs) and full-text searching > of warcs. > > So, here is my issue. Is an object a given crawl of a site? For example > web crawl of http://yfile.news.yorku.ca on March 4, 2013? Or is an > object a given website, the yfile example, and each crawl is a version > of a datastream? > > To me it all seems like a matter of how a given collection is arranged > and described, and both solutions are technically correct. But, is one > way better than the other? > > If you'll indulge me, I'd love to hear your input. > > cheers! > > -- > -nruest > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > <mailto:Fedora-commons-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > > > > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > -- -nruest ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users