The biggest concerns I have about number one are how unruly it could get 
very quickly given some sites have to be archived 5 days a week; Monday 
- Friday. That would be 261 versions of an object/year.

What I had worked out in my head, was a combination of 1 & 2. Where I 
create a collection object (Islandora parlance) and then and object for 
every crawl, which in turn would be a child of the collection object.

I think if I go that direction, I would be able to quickly display the 
number of crawls for a given site, then provide some "serendipitous" 
browse options for a user - paging through screenshots, date lists, 
full-text searching, etc. I just really have to figure out how to get 
fcrepo and the wayback machine to talk to each other :-)

Also, FWIW we had a fruitful discussion[1] on Twitter early if anybody 
is interest in the thread. It seemed to jump off list and back on.

Thanks for jumping up Ben!

-nruest

[1] https://twitter.com/mjgiarlo/status/308788759246303233

On 13-03-04 11:36 PM, Benjamin Armintor wrote:
> I can immediately think of two approaches I'd consider:
> 1. Each Fedora object represents a website. There is a content DS that
> is a WARC; it is versioned. There is a screenshot DS and a full-text DS-
> these are unversioned and presumed to relate to the most recent WARC.
> You could have a similar DS for descriptive metadata. This model hinges
> on being able to assume the non-WARC datastreams only really have a
> relationship to the object, not a particular crawl.
>
> 2. Every crawl is an object, meant never to be altered. The related DSs
> might change between versions- you would have to directly compare the
> analogous DSs in two objects. You would refer to the website across all
> versions either in a metadatum (a RELS-EXT relationship, or something
> consistent in the descMetadata like a MODS relatedItem) or as an
> umbrella object across all the versions (obviously still marking up the
> isVersionOf/isMemberOf relationships, but having a place to locate the
> generic descriptions separate from the crawl descriptions).
>
> The latter is probably more "correct", but it will also be more
> cumbersome to work with.  If you wanted to actually extract resources
> from a WARC (a particular file asset, for example), I think you really
> have to follow a plan like the second option.
>
> - Ben
>
>
> On Mon, Mar 4, 2013 at 10:56 PM, Nick Ruest <rue...@gmail.com
> <mailto:rue...@gmail.com>> wrote:
>
>     Hi folks,
>
>     I began working on an Islandora Solution Pack for web archives a while
>     back, and the more I work on it and think about it I'm a little stuck on
>     an foundational aspect, what is the object?
>
>     The way I had initially constructed it as a proof of concept was just
>     ingesting and disseminating warc files. But, as I learn more and more
>     about web archiving, there is more I'd like to do dissemination wise
>     with associated datastreams (screenshots, pdfs) and full-text searching
>     of warcs.
>
>     So, here is my issue. Is an object a given crawl of a site? For example
>     web crawl of http://yfile.news.yorku.ca on March 4, 2013? Or is an
>     object a given website, the yfile example, and each crawl is a version
>     of a datastream?
>
>     To me it all seems like a matter of how a given collection is arranged
>     and described, and both solutions are technically correct. But, is one
>     way better than the other?
>
>     If you'll indulge me, I'd love to hear your input.
>
>     cheers!
>
>     --
>     -nruest
>
>     
> ------------------------------------------------------------------------------
>     Everyone hates slow websites. So do we.
>     Make your web apps faster with AppDynamics
>     Download AppDynamics Lite for free today:
>     http://p.sf.net/sfu/appdyn_d2d_feb
>     _______________________________________________
>     Fedora-commons-users mailing list
>     Fedora-commons-users@lists.sourceforge.net
>     <mailto:Fedora-commons-users@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
>
>
>
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>

-- 
-nruest

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to