Vincent Massol wrote:
> On Mar 4, 2008, at 10:07 AM, Paul Libbrecht wrote:
> 
>> Could I add yet another idea which is hanging around since long, I  
>> think: java content repository ?
>>
>> It may have catches in licenses (just as any of these JCR efforts)  
>> but I believe this is a sturdy way to expose streams of varying  
>> size. Indeed, it'd need a file-system-storage but that's a good  
>> thing certainly or?
>>
>> I am not really an expert there unfortunately, but last I played  
>> with jackRabbit it really seemed like a sturdy piece you could rely  
>> on.
> 
> Yes, agreed. BTW JCR doesn't mandate a file system storage. It  
> abstracts it. The Exo implementation for example has several different  
> implementations available.
> 
> Even better, it's already been implemented for more than a year! If  
> you check the source code you'll see it's there and even bundled in  
> the XWiki since the 1.0 release...
> 
> Now the bad news: I don't think it's working and would need someone  
> working on it.

Could be a SoC project?

> -Vincent
> 
>> Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit :
>>
>>> Vincent Massol wrote:
>>>> Nice work Sergiu. We should transform this into a jira issue to not
>>>> forget it.
>>>>
>>> We should vote for it first.
>>>
>>>> One other idea: store attachments on the file system and not in  
>>>> the DB.
>>>>
>>>> Thanks
>>>> -Vincent
>>>>
>>>> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> Last night I checked what happens when uploading a file, and why  
>>>>> does
>>>>> that action require huge amounts of memory.
>>>>>
>>>>> So, whenever uploading a file, there are several places where the  
>>>>> file
>>>>> content is loaded into memory:
>>>>> - as an XWikiAttachment as byte[] ~= filesize
>>>>> - as an XWikiAttachmentArchive as Base64 encoded string ~=
>>>>> 2*4*filesize
>>>>> - as hibernate tokens that are sent to the database, clones of the
>>>>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize
>>>>> - as Cached attachments and attachment archive, clones of the  
>>>>> same 2
>>>>> objects ~= 9*filesize
>>>>>
>>>>> Total: ~27*filesize bytes in memory.
>>>>>
>>>>> So, out of a 10M file, we get at least 270M of needed memory.
>>>>>
>>>>> Worse, if this is not the first version of the attachment, then the
>>>>> complete attachment history is loaded in memory, so add another
>>>>> 24*versionsize*versions of memory needed during upload.
>>>>>
>>>>> After the upload is done, most of these are cleared, only the  
>>>>> cached
>>>>> objects will remain in memory.
>>>>>
>>>>> However, a problem still remains with the cache. It is a LRU cache
>>>>> with
>>>>> a fixed capacity, so even if the memory is full, the cached
>>>>> attachments
>>>>> will not be released.
>>>>>
>>>>> Things we can improve:
>>>>> - Make the cache use References. This will allow cached  
>>>>> attachments to
>>>>> be removed from memory when there's a need for more memory
>>>>> - Do a better attachment archive system. I'm not sure it is a good
>>>>> idea
>>>>> to have diff-based versioning of attachments. In theory, it saves
>>>>> space
>>>>> when versions are much alike, but it does not really work in  
>>>>> practice
>>>>> because it does a line-diff, and a base64 encoded string does not  
>>>>> have
>>>>> newlines. What's more, the space gain would be efficient when there
>>>>> are
>>>>> many versions, as one version alone takes 4 times more space than a
>>>>> binary dump of the content.
>>>>>
>>>>> Suppose we switch to a "one version per table row" for attachment
>>>>> history, with direct binary dump, then the memory needed for  
>>>>> uploading
>>>>> would be 6*filesize, which is much less.


-- 
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to