Harvard University Library is pleased to announce the public launch of 
Harvard's new Web Archive Collection Service (WAX) 
http://wax.lib.harvard.edu.

WAX began as a pilot project in July 2006, funded by the University's 
Library Digital Initiative (LDI) to address the management of web sites 
by collection managers for long-term archiving. It was the first LDI 
project specifically oriented toward preserving "born-digital" material.

The pilot was designed to address the capture, management, storage, and 
display of web sites for long-term archiving. It was a collaboration of 
the University Library's Office for Information Systems with three 
University partners, each fielding a single project: the Harvard 
University Archives (Harvard University Library); the Arthur and 
Elizabeth Schlesinger Library on the History of Women in America 
(Radcliffe Institute for Advanced Study); and the Edwin O. Reischauer 
Institute of Japanese Studies (Faculty of Arts and Sciences, with 
sponsorship from Harvard College Library).

During the pilot, we explored the legal terrain and implemented several 
methods of mitigating risks. We investigated various technologies and 
developed work flow efficiencies for the collection managers and the 
technologists. We analyzed and implemented the metadata and deposit 
requirements for long term preservation in our repository. We continue 
to look at ways to ease the labor intensive nature of the QA process, to 
improve display as the software matures and to assess additional 
requirements for long term preservation.

To date, we are storing 5,159 ARC files for 1405 WAX harvests 
representing 141 seeds (starting URLs) in our Digital Repository Service 
(DRS). These include 335 MIME types, 12,133,528 resources (individual 
HTML pages, images, graphics, audio or video clips, style sheets, 
scripts, etc.) for a total of 392 gigabytes.

WAX was built using several open source tools developed by the Internet 
Archive and other International Internet Preservation Consortium (IIPC) 
members. These IIPC tools include the Heritrix web crawler; the Wayback 
index and rendering tool; and the NutchWAX index and search tool. WAX 
also uses Quartz open source job scheduling software from OpenSymphony.

In February 2009, the pilot public interface was launched and announced 
to the University community. WAX has now transitioned to a production 
system supported by the University Library's central infrastructure.

To view the collections, visit: http://wax.lib.harvard.edu. For more 
information, visit: http://hul.harvard.edu/ois/systems/wax, consult the 
May 2009 Power Point presentation: 
http://hul.harvard.edu/ois/support/docs-wax.html, or contact Wendy 
Gogel: [email protected]

Wendy Marcus Gogel
Digital Projects Program Librarian
HUL - Office for Information Systems
90 Mt. Auburn Street
Cambridge, MA 02138
phone: (617) 495-3724
fax: (617) 496-5600
[email protected]
http://digitalcollections.harvard.edu


_______________________________________________
Instruções para desiscrever-se por conta própria:
http://listas.ibict.br/cgi-bin/mailman/options/bib_virtual
Bib_virtual mailing list
[email protected]
http://listas.ibict.br/cgi-bin/mailman/listinfo/bib_virtual

Responder a