On 10/14/2010 12:53 PM, Paul Libbrecht wrote:
> Caleb,
> 
> your analysis seems to match very well what I had observed.
> Does it mean that such changes would also affect the XML-serailization?

If we were to implement a "binary" database table, we would have the 
flexability to decide whether
we wish to keep using the XML format or not. The XML serializer despite 
increasing the size by 30%
is already well streamlined for large content. The JRCS versioning store on the 
other hand is not
prepared to handle large content so with a binary database table we would have 
the option of
donating a patch set to the JRCS people or choosing a different versioning 
system.

Caleb

> 
> paul
> 
> 
> On 14 oct. 2010, at 15:24, Caleb James DeLisle wrote:
> 
>> Hi,
>> I have some changes to the attachment system which will allow XWiki to 
>> handle much larger
>> attachments without memory exhaustion. I have found that there are some 
>> places where I cannot make
>> any changes because the code is not in XWiki but rather in JRCS.
>>
>> XWiki versions attachments by creating a JRCS node for the XML version of 
>> each version of each
>> attachment. This means that memory consumption improvements hit a hard wall 
>> at 2 * 2 * 1.3 * the
>> size of the attachment. base-64 encoding for XML increases the size by 1.3 
>> times, storage as a
>> String (array of 16 bit chars) doubles the size and the need to copy the 
>> String doubles the size again.
>>
>> The second issue is that the database and JDBC do not handle multiple 
>> hundreds of megabytes in a
>> single query well. If I try to attach a 500MB attachment with attachment 
>> versioning disabled, my
>> changes allow the attachment to be streamed to the database but postgresql 
>> is not able to save it. I
>> am able to attach a 256MB attachment but with 512MB of heap space, the 
>> attachment cannot be loaded
>> from the database because JDBC lacks the necessary streaming functionality.
>>
>> An option which I am now considering is adding a binary table to the 
>> database schema. The table
>> would contain a composite id made of the id of the data and the part number 
>> of that entry, and a
>> data column slightly smaller than 1MB (default max_allowed_packet in mysql). 
>> All interaction with
>> this table would go through a storage engine which would require 
>> InputStreams and OutputStreams and
>> the streams would be written and read by the storage mechanism which would 
>> ID tag them and break
>> them up into parts to be sent to the database individually.
>>
>> WDYT?
>>
>> Caleb
>>
>>
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
> 
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
> 

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to