On 10/14/2010 12:53 PM, Paul Libbrecht wrote: > Caleb, > > your analysis seems to match very well what I had observed. > Does it mean that such changes would also affect the XML-serailization?
If we were to implement a "binary" database table, we would have the flexability to decide whether we wish to keep using the XML format or not. The XML serializer despite increasing the size by 30% is already well streamlined for large content. The JRCS versioning store on the other hand is not prepared to handle large content so with a binary database table we would have the option of donating a patch set to the JRCS people or choosing a different versioning system. Caleb > > paul > > > On 14 oct. 2010, at 15:24, Caleb James DeLisle wrote: > >> Hi, >> I have some changes to the attachment system which will allow XWiki to >> handle much larger >> attachments without memory exhaustion. I have found that there are some >> places where I cannot make >> any changes because the code is not in XWiki but rather in JRCS. >> >> XWiki versions attachments by creating a JRCS node for the XML version of >> each version of each >> attachment. This means that memory consumption improvements hit a hard wall >> at 2 * 2 * 1.3 * the >> size of the attachment. base-64 encoding for XML increases the size by 1.3 >> times, storage as a >> String (array of 16 bit chars) doubles the size and the need to copy the >> String doubles the size again. >> >> The second issue is that the database and JDBC do not handle multiple >> hundreds of megabytes in a >> single query well. If I try to attach a 500MB attachment with attachment >> versioning disabled, my >> changes allow the attachment to be streamed to the database but postgresql >> is not able to save it. I >> am able to attach a 256MB attachment but with 512MB of heap space, the >> attachment cannot be loaded >> from the database because JDBC lacks the necessary streaming functionality. >> >> An option which I am now considering is adding a binary table to the >> database schema. The table >> would contain a composite id made of the id of the data and the part number >> of that entry, and a >> data column slightly smaller than 1MB (default max_allowed_packet in mysql). >> All interaction with >> this table would go through a storage engine which would require >> InputStreams and OutputStreams and >> the streams would be written and read by the storage mechanism which would >> ID tag them and break >> them up into parts to be sent to the database individually. >> >> WDYT? >> >> Caleb >> >> >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs > > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

