[
https://jira.duraspace.org/browse/DS-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Donohue updated DS-1382:
----------------------------
Description:
The DSpace 3.0 model for storing Item Versions in AIPs is to generate a
*separate* AIP for each version of the Item.
Suppose you have an Item "123/45" with old versions "123/45.1" and "123/45.2".
To export all versions, you'd need to export a total of 3 AIPs (123-45.zip,
123-45.1.zip and 123-45.2.zip), one for each version.
Although this may sound reasonable, it can lead to "ballooning storage costs"
as you version Items. Since 3 AIPs are generated in the above example, each of
the 3 AIPs must duplicate all content files within it. So, if the size of the
initial AIP is 100KB, after 10 versions, you may be storing around
10x100KB=~1MB of content, much of it actually duplicative in nature. A few ways
around this issue would be to either:
(a) store AIPs as "unzipped" (so they could link to the same content
files & avoid some content duplication), OR
(b) generate a single AIP zip package which describes all versions of
the Item (again that way you could avoid content file duplication). This single
AIP zip package could either describe all versions in a single METS file, or
potentially include a separate METS file for each version.
Either option we take, this will require some (likely major) rework of the AIP
format. Obviously we'd need to make it backwards compatible with past AIP
formats.
https://wiki.duraspace.org/display/DSDOC3x/DSpace+AIP+Format
was:
The DSpace 3.0 model for storing Item Versions in AIPs is to generate a
*separate* AIP for each version of the Item.
Suppose you have an Item "123/45" with old versions "123/45.1" and "123/45.2".
To export all versions, you'd need to export a total of 3 AIPs (123-45.zip,
123-45.1.zip and 123-45.2.zip), one for each version.
Although this may sound reasonable, it can lead to "ballooning storage costs"
as you version Items. Since 3 AIPs are generated in the above example, each of
the 3 AIPs must duplicate all content files within it. So, if the size of the
initial AIP is 100KB, after 10 versions, you may be storing around
10x100KB=~1MB of content, much of it actually duplicative in nature. A few ways
around this issue would be to either:
(a) store AIPs as "unzipped" (so they could link to the same content
files & avoid some content duplication), OR
(b) generate a single AIP zip package which describes all versions of
the Item (again that way you could avoid content file duplication). This single
AIP zip package could either describe all versions in a single METS file, or
potentially include a separate METS file for each version.
> AIP Backup & Restore functionality should not duplicate unchanged files
> across Item Versions
> --------------------------------------------------------------------------------------------
>
> Key: DS-1382
> URL: https://jira.duraspace.org/browse/DS-1382
> Project: DSpace
> Issue Type: Improvement
> Components: DSpace API
> Affects Versions: 3.0
> Reporter: Tim Donohue
> Priority: Major
>
> The DSpace 3.0 model for storing Item Versions in AIPs is to generate a
> *separate* AIP for each version of the Item.
> Suppose you have an Item "123/45" with old versions "123/45.1" and
> "123/45.2". To export all versions, you'd need to export a total of 3 AIPs
> (123-45.zip, 123-45.1.zip and 123-45.2.zip), one for each version.
> Although this may sound reasonable, it can lead to "ballooning storage costs"
> as you version Items. Since 3 AIPs are generated in the above example, each
> of the 3 AIPs must duplicate all content files within it. So, if the size of
> the initial AIP is 100KB, after 10 versions, you may be storing around
> 10x100KB=~1MB of content, much of it actually duplicative in nature. A few
> ways around this issue would be to either:
> (a) store AIPs as "unzipped" (so they could link to the same content
> files & avoid some content duplication), OR
> (b) generate a single AIP zip package which describes all versions of
> the Item (again that way you could avoid content file duplication). This
> single AIP zip package could either describe all versions in a single METS
> file, or potentially include a separate METS file for each version.
> Either option we take, this will require some (likely major) rework of the
> AIP format. Obviously we'd need to make it backwards compatible with past AIP
> formats.
> https://wiki.duraspace.org/display/DSDOC3x/DSpace+AIP+Format
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel