Vincent Massol wrote: > On Mar 4, 2008, at 10:07 AM, Paul Libbrecht wrote: > >> Could I add yet another idea which is hanging around since long, I >> think: java content repository ? >> >> It may have catches in licenses (just as any of these JCR efforts) >> but I believe this is a sturdy way to expose streams of varying >> size. Indeed, it'd need a file-system-storage but that's a good >> thing certainly or? >> >> I am not really an expert there unfortunately, but last I played >> with jackRabbit it really seemed like a sturdy piece you could rely >> on. > > Yes, agreed. BTW JCR doesn't mandate a file system storage. It > abstracts it. The Exo implementation for example has several different > implementations available. > > Even better, it's already been implemented for more than a year! If > you check the source code you'll see it's there and even bundled in > the XWiki since the 1.0 release... > > Now the bad news: I don't think it's working and would need someone > working on it.
Could be a SoC project? > -Vincent > >> Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit : >> >>> Vincent Massol wrote: >>>> Nice work Sergiu. We should transform this into a jira issue to not >>>> forget it. >>>> >>> We should vote for it first. >>> >>>> One other idea: store attachments on the file system and not in >>>> the DB. >>>> >>>> Thanks >>>> -Vincent >>>> >>>> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote: >>>> >>>>> Hi devs, >>>>> >>>>> Last night I checked what happens when uploading a file, and why >>>>> does >>>>> that action require huge amounts of memory. >>>>> >>>>> So, whenever uploading a file, there are several places where the >>>>> file >>>>> content is loaded into memory: >>>>> - as an XWikiAttachment as byte[] ~= filesize >>>>> - as an XWikiAttachmentArchive as Base64 encoded string ~= >>>>> 2*4*filesize >>>>> - as hibernate tokens that are sent to the database, clones of the >>>>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize >>>>> - as Cached attachments and attachment archive, clones of the >>>>> same 2 >>>>> objects ~= 9*filesize >>>>> >>>>> Total: ~27*filesize bytes in memory. >>>>> >>>>> So, out of a 10M file, we get at least 270M of needed memory. >>>>> >>>>> Worse, if this is not the first version of the attachment, then the >>>>> complete attachment history is loaded in memory, so add another >>>>> 24*versionsize*versions of memory needed during upload. >>>>> >>>>> After the upload is done, most of these are cleared, only the >>>>> cached >>>>> objects will remain in memory. >>>>> >>>>> However, a problem still remains with the cache. It is a LRU cache >>>>> with >>>>> a fixed capacity, so even if the memory is full, the cached >>>>> attachments >>>>> will not be released. >>>>> >>>>> Things we can improve: >>>>> - Make the cache use References. This will allow cached >>>>> attachments to >>>>> be removed from memory when there's a need for more memory >>>>> - Do a better attachment archive system. I'm not sure it is a good >>>>> idea >>>>> to have diff-based versioning of attachments. In theory, it saves >>>>> space >>>>> when versions are much alike, but it does not really work in >>>>> practice >>>>> because it does a line-diff, and a base64 encoded string does not >>>>> have >>>>> newlines. What's more, the space gain would be efficient when there >>>>> are >>>>> many versions, as one version alone takes 4 times more space than a >>>>> binary dump of the content. >>>>> >>>>> Suppose we switch to a "one version per table row" for attachment >>>>> history, with direct binary dump, then the memory needed for >>>>> uploading >>>>> would be 6*filesize, which is much less. -- Sergiu Dumitriu http://purl.org/net/sergiu/ _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

