Re: [xwiki-devs] Profiling: Why do attachments require so much memory

Vincent Massol Tue, 04 Mar 2008 03:25:27 -0800

On Mar 4, 2008, at 10:07 AM, Paul Libbrecht wrote:

> Could I add yet another idea which is hanging around since long, I  
> think: java content repository ?
>
> It may have catches in licenses (just as any of these JCR efforts)  
> but I believe this is a sturdy way to expose streams of varying  
> size. Indeed, it'd need a file-system-storage but that's a good  
> thing certainly or?
>
> I am not really an expert there unfortunately, but last I played  
> with jackRabbit it really seemed like a sturdy piece you could rely  
> on.


Yes, agreed. BTW JCR doesn't mandate a file system storage. It  
abstracts it. The Exo implementation for example has several different  
implementations available.

Even better, it's already been implemented for more than a year! If  
you check the source code you'll see it's there and even bundled in  
the XWiki since the 1.0 release...

Now the bad news: I don't think it's working and would need someone  
working on it.

-Vincent

> Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit :
>
>> Vincent Massol wrote:
>>> Nice work Sergiu. We should transform this into a jira issue to not
>>> forget it.
>>>
>>
>> We should vote for it first.
>>
>>> One other idea: store attachments on the file system and not in  
>>> the DB.
>>>
>>> Thanks
>>> -Vincent
>>>
>>> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote:
>>>
>>>> Hi devs,
>>>>
>>>> Last night I checked what happens when uploading a file, and why  
>>>> does
>>>> that action require huge amounts of memory.
>>>>
>>>> So, whenever uploading a file, there are several places where the  
>>>> file
>>>> content is loaded into memory:
>>>> - as an XWikiAttachment as byte[] ~= filesize
>>>> - as an XWikiAttachmentArchive as Base64 encoded string ~=
>>>> 2*4*filesize
>>>> - as hibernate tokens that are sent to the database, clones of the
>>>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize
>>>> - as Cached attachments and attachment archive, clones of the  
>>>> same 2
>>>> objects ~= 9*filesize
>>>>
>>>> Total: ~27*filesize bytes in memory.
>>>>
>>>> So, out of a 10M file, we get at least 270M of needed memory.
>>>>
>>>> Worse, if this is not the first version of the attachment, then the
>>>> complete attachment history is loaded in memory, so add another
>>>> 24*versionsize*versions of memory needed during upload.
>>>>
>>>> After the upload is done, most of these are cleared, only the  
>>>> cached
>>>> objects will remain in memory.
>>>>
>>>> However, a problem still remains with the cache. It is a LRU cache
>>>> with
>>>> a fixed capacity, so even if the memory is full, the cached
>>>> attachments
>>>> will not be released.
>>>>
>>>> Things we can improve:
>>>> - Make the cache use References. This will allow cached  
>>>> attachments to
>>>> be removed from memory when there's a need for more memory
>>>> - Do a better attachment archive system. I'm not sure it is a good
>>>> idea
>>>> to have diff-based versioning of attachments. In theory, it saves
>>>> space
>>>> when versions are much alike, but it does not really work in  
>>>> practice
>>>> because it does a line-diff, and a base64 encoded string does not  
>>>> have
>>>> newlines. What's more, the space gain would be efficient when there
>>>> are
>>>> many versions, as one version alone takes 4 times more space than a
>>>> binary dump of the content.
>>>>
>>>> Suppose we switch to a "one version per table row" for attachment
>>>> history, with direct binary dump, then the memory needed for  
>>>> uploading
>>>> would be 6*filesize, which is much less.
>>
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] Profiling: Why do attachments require so much memory

Reply via email to