Hi folks,

I began working on an Islandora Solution Pack for web archives a while 
back, and the more I work on it and think about it I'm a little stuck on 
an foundational aspect, what is the object?

The way I had initially constructed it as a proof of concept was just 
ingesting and disseminating warc files. But, as I learn more and more 
about web archiving, there is more I'd like to do dissemination wise 
with associated datastreams (screenshots, pdfs) and full-text searching 
of warcs.

So, here is my issue. Is an object a given crawl of a site? For example 
web crawl of http://yfile.news.yorku.ca on March 4, 2013? Or is an 
object a given website, the yfile example, and each crawl is a version 
of a datastream?

To me it all seems like a matter of how a given collection is arranged 
and described, and both solutions are technically correct. But, is one 
way better than the other?

If you'll indulge me, I'd love to hear your input.

cheers!

-- 
-nruest

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to