On Mar 4, 2008, at 10:07 AM, Paul Libbrecht wrote: > Could I add yet another idea which is hanging around since long, I > think: java content repository ? > > It may have catches in licenses (just as any of these JCR efforts) > but I believe this is a sturdy way to expose streams of varying > size. Indeed, it'd need a file-system-storage but that's a good > thing certainly or? > > I am not really an expert there unfortunately, but last I played > with jackRabbit it really seemed like a sturdy piece you could rely > on.
Yes, agreed. BTW JCR doesn't mandate a file system storage. It abstracts it. The Exo implementation for example has several different implementations available. Even better, it's already been implemented for more than a year! If you check the source code you'll see it's there and even bundled in the XWiki since the 1.0 release... Now the bad news: I don't think it's working and would need someone working on it. -Vincent > Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit : > >> Vincent Massol wrote: >>> Nice work Sergiu. We should transform this into a jira issue to not >>> forget it. >>> >> >> We should vote for it first. >> >>> One other idea: store attachments on the file system and not in >>> the DB. >>> >>> Thanks >>> -Vincent >>> >>> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote: >>> >>>> Hi devs, >>>> >>>> Last night I checked what happens when uploading a file, and why >>>> does >>>> that action require huge amounts of memory. >>>> >>>> So, whenever uploading a file, there are several places where the >>>> file >>>> content is loaded into memory: >>>> - as an XWikiAttachment as byte[] ~= filesize >>>> - as an XWikiAttachmentArchive as Base64 encoded string ~= >>>> 2*4*filesize >>>> - as hibernate tokens that are sent to the database, clones of the >>>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize >>>> - as Cached attachments and attachment archive, clones of the >>>> same 2 >>>> objects ~= 9*filesize >>>> >>>> Total: ~27*filesize bytes in memory. >>>> >>>> So, out of a 10M file, we get at least 270M of needed memory. >>>> >>>> Worse, if this is not the first version of the attachment, then the >>>> complete attachment history is loaded in memory, so add another >>>> 24*versionsize*versions of memory needed during upload. >>>> >>>> After the upload is done, most of these are cleared, only the >>>> cached >>>> objects will remain in memory. >>>> >>>> However, a problem still remains with the cache. It is a LRU cache >>>> with >>>> a fixed capacity, so even if the memory is full, the cached >>>> attachments >>>> will not be released. >>>> >>>> Things we can improve: >>>> - Make the cache use References. This will allow cached >>>> attachments to >>>> be removed from memory when there's a need for more memory >>>> - Do a better attachment archive system. I'm not sure it is a good >>>> idea >>>> to have diff-based versioning of attachments. In theory, it saves >>>> space >>>> when versions are much alike, but it does not really work in >>>> practice >>>> because it does a line-diff, and a base64 encoded string does not >>>> have >>>> newlines. What's more, the space gain would be efficient when there >>>> are >>>> many versions, as one version alone takes 4 times more space than a >>>> binary dump of the content. >>>> >>>> Suppose we switch to a "one version per table row" for attachment >>>> history, with direct binary dump, then the memory needed for >>>> uploading >>>> would be 6*filesize, which is much less. >> _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

