[ 
https://jira.duraspace.org/browse/DS-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Donohue updated DS-1382:
----------------------------

    Description: 
The DSpace 3.0 model for storing Item Versions in AIPs is to generate a 
*separate* AIP for each version of the Item. 

Suppose you have an Item "123/45" with old versions "123/45.1" and "123/45.2". 
To export all versions, you'd need to export a total of 3 AIPs (123-45.zip, 
123-45.1.zip and 123-45.2.zip), one for each version. 

Although this may sound reasonable, it can lead to "ballooning storage costs" 
as you version Items. Since 3 AIPs are generated in the above example, each of 
the 3 AIPs must duplicate all content files within it. So, if the size of the 
initial AIP is 100KB, after 10 versions, you may be storing around 
10x100KB=~1MB of content, much of it actually duplicative in nature. A few ways 
around this issue would be to either:
       (a) store AIPs as "unzipped" (so they could link to the same content 
files & avoid some content duplication), OR
       (b) generate a single AIP zip package which describes all versions of 
the Item (again that way you could avoid content file duplication). This single 
AIP zip package could either describe all versions in a single METS file, or 
potentially include a separate METS file for each version.

Either option we take, this will require some (likely major) rework of the AIP 
format. Obviously we'd need to make it backwards compatible with past AIP 
formats.
https://wiki.duraspace.org/display/DSDOC3x/DSpace+AIP+Format

  was:
The DSpace 3.0 model for storing Item Versions in AIPs is to generate a 
*separate* AIP for each version of the Item. 

Suppose you have an Item "123/45" with old versions "123/45.1" and "123/45.2". 
To export all versions, you'd need to export a total of 3 AIPs (123-45.zip, 
123-45.1.zip and 123-45.2.zip), one for each version. 

Although this may sound reasonable, it can lead to "ballooning storage costs" 
as you version Items. Since 3 AIPs are generated in the above example, each of 
the 3 AIPs must duplicate all content files within it. So, if the size of the 
initial AIP is 100KB, after 10 versions, you may be storing around 
10x100KB=~1MB of content, much of it actually duplicative in nature. A few ways 
around this issue would be to either:
       (a) store AIPs as "unzipped" (so they could link to the same content 
files & avoid some content duplication), OR
       (b) generate a single AIP zip package which describes all versions of 
the Item (again that way you could avoid content file duplication). This single 
AIP zip package could either describe all versions in a single METS file, or 
potentially include a separate METS file for each version.

    
> AIP Backup & Restore functionality should not duplicate unchanged files 
> across Item Versions
> --------------------------------------------------------------------------------------------
>
>                 Key: DS-1382
>                 URL: https://jira.duraspace.org/browse/DS-1382
>             Project: DSpace
>          Issue Type: Improvement
>          Components: DSpace API
>    Affects Versions: 3.0
>            Reporter: Tim Donohue
>            Priority: Major
>
> The DSpace 3.0 model for storing Item Versions in AIPs is to generate a 
> *separate* AIP for each version of the Item. 
> Suppose you have an Item "123/45" with old versions "123/45.1" and 
> "123/45.2". To export all versions, you'd need to export a total of 3 AIPs 
> (123-45.zip, 123-45.1.zip and 123-45.2.zip), one for each version. 
> Although this may sound reasonable, it can lead to "ballooning storage costs" 
> as you version Items. Since 3 AIPs are generated in the above example, each 
> of the 3 AIPs must duplicate all content files within it. So, if the size of 
> the initial AIP is 100KB, after 10 versions, you may be storing around 
> 10x100KB=~1MB of content, much of it actually duplicative in nature. A few 
> ways around this issue would be to either:
>        (a) store AIPs as "unzipped" (so they could link to the same content 
> files & avoid some content duplication), OR
>        (b) generate a single AIP zip package which describes all versions of 
> the Item (again that way you could avoid content file duplication). This 
> single AIP zip package could either describe all versions in a single METS 
> file, or potentially include a separate METS file for each version.
> Either option we take, this will require some (likely major) rework of the 
> AIP format. Obviously we'd need to make it backwards compatible with past AIP 
> formats.
> https://wiki.duraspace.org/display/DSDOC3x/DSpace+AIP+Format

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to