Vincent Massol wrote:
> Nice work Sergiu. We should transform this into a jira issue to not  
> forget it.
> 

We should vote for it first.

> One other idea: store attachments on the file system and not in the DB.
> 
> Thanks
> -Vincent
> 
> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote:
> 
>> Hi devs,
>>
>> Last night I checked what happens when uploading a file, and why does
>> that action require huge amounts of memory.
>>
>> So, whenever uploading a file, there are several places where the file
>> content is loaded into memory:
>> - as an XWikiAttachment as byte[] ~= filesize
>> - as an XWikiAttachmentArchive as Base64 encoded string ~=  
>> 2*4*filesize
>> - as hibernate tokens that are sent to the database, clones of the
>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize
>> - as Cached attachments and attachment archive, clones of the same 2
>> objects ~= 9*filesize
>>
>> Total: ~27*filesize bytes in memory.
>>
>> So, out of a 10M file, we get at least 270M of needed memory.
>>
>> Worse, if this is not the first version of the attachment, then the
>> complete attachment history is loaded in memory, so add another
>> 24*versionsize*versions of memory needed during upload.
>>
>> After the upload is done, most of these are cleared, only the cached
>> objects will remain in memory.
>>
>> However, a problem still remains with the cache. It is a LRU cache  
>> with
>> a fixed capacity, so even if the memory is full, the cached  
>> attachments
>> will not be released.
>>
>> Things we can improve:
>> - Make the cache use References. This will allow cached attachments to
>> be removed from memory when there's a need for more memory
>> - Do a better attachment archive system. I'm not sure it is a good  
>> idea
>> to have diff-based versioning of attachments. In theory, it saves  
>> space
>> when versions are much alike, but it does not really work in practice
>> because it does a line-diff, and a base64 encoded string does not have
>> newlines. What's more, the space gain would be efficient when there  
>> are
>> many versions, as one version alone takes 4 times more space than a
>> binary dump of the content.
>>
>> Suppose we switch to a "one version per table row" for attachment
>> history, with direct binary dump, then the memory needed for uploading
>> would be 6*filesize, which is much less.


-- 
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to