As a summary:
I'm +1 with alternate storage mecanisms as long as we can turn them on optionnally.
The storage systems should be: * binary chunk * file systemWe also need a migration strategy and also decide which one would be the default. In any case short term we should test it well before we move to a new system being the default.
Le 20/10/10 12:55, Caleb James DeLisle a écrit :
On 10/20/2010 06:28 AM, Ludovic Dubost wrote:Le 20/10/10 12:03, Caleb James DeLisle a écrit :On 10/20/2010 05:33 AM, Ludovic Dubost wrote:Hi, We do want the availability of file attachment storage (Sergiu has done an implementation during the summer hackathon), but as Guillaume said it should be to the choice of the administrator. Now concerning database storage, about Hibernate, does it means streams are not available at all in Hibernate or does it mean they don't always work ? If streams are available for the databases that support it, which ones support it ?They are available, they require use of the blob type so we would have to add a column. I was warned about incompatibility issues. I understand that mysql and mssql stream the content back onto the heap before saving which ruins any memory savings. Postgres seems to support blobs but I was warned about strange issues, specifically this: http://in.relation.to/Bloggers/PostgreSQLAndBLOBs I was told that oracle has the best streaming support but I also read that oracle blob support requires the use of proprietary api. This is why I had opted for chinking the data instead of using blobs.Indeed if we are positive that mysql will use the heap to store the full BLOB then there is no point to this solution since it is our main database.Concerning your proposal, it's interesting as indeed if we use streams for everything else, we do get rid of the memory consuption issue for attachments. Now I have a few concerns: - complexity and management of the data. What happens if we have a corrupted DB and one of the chunks fails to save. We might end up with invalid content.I had planned on committing the transaction only after all chunks are saved, if the database has memory issues with large commits, another possibility would be to verify after saving and throw an exception if that fails.That might indeed help if everything is in one transation except that MyISAM is not transactional so we can end up with incomplete data. We do need a way to verify the coherency. We could consider that if the size is incorrect we don't accept the result.It sounds like there might be a need for coherency verification which runs only on mysql with myiasm.- we also have to solve other large items (like attachment history or recycle bin of attachments)This is why I favor a generic BinaryStore rather than a change to XWikiHibernateAttachmentStore. Another issue which will have to be addressed is the memory consumption of JRCS for AttachmentArchive.At the same time we should avoid mixing apples and oranges. We should not have data with different meanings in different tables.Do you mean not have data with different meaning in the same table? If so, I'm not sure I'm sold on the idea since it's how XWikiStringProperty works (holds string content for many different types of objects). A BinaryChunk table would hold data which would not make sense to query so I think anything which needed to store binary content in the database should be able to use the same mechanism.
It's true, so we could go with one big table.
For Attachment Archive, I'm not against a solution which stops doing RCS. It has never been efficient anyway.+1 Trying to imagine how that would be done.On a side note concerning the max_allowed_packet issue in MySQL, I was able to change that value at runtime (from the mysql console). If this also works using a remote connection, maybe we could hack and force a big value at runtime. This would be really great because the max_allowed_packet is killing us. XWiki does not report it well in many cases and almost no customers reads the documentation and sets the value properly. We also have seen in many cases, where the database is shared with other applications, and there is little access to the database configuration and to the ability of restart. To make it short, the max_allowed_packet issue is a major issue when operating XWiki.``little access to the database configuration'' This may also mean the xwiki user does not have permission to change the setting at runtime.What I meant is not being allowed to restart it.Before we go into large fixes for that problem, could we maybe at least check that we report errors properly (on a 2.0.5 we were not for sure at least for attachment saving failure).The fix to http://jira.xwiki.org/jira/browse/XWIKI-5405 has changed attachments so that the content and meta data are all saved in a single transaction and http://jira.xwiki.org/jira/browse/XWIKI-5474 prevents documents from being cached on save so we should have no more attachments which dissapear when the cache is purged.Great. This will at least make the problem show up right away. Does 5405 protect us from having the attachment in the attachment list and have no content ?If the content fails to save (in a transactional database) the attachment will not save either.We should also make sure we can always delete even when we cannot read the data in memory. This is also not the case when we cannot read the data because it's too big or because one of the tables does not have any data.Sounds like a test ;)You mean a test for you ? a test in the code ? or an XWiki test suite ? It'a bit of a complex test which requires to screw up attachment data in all way possible and prove that you can still delete everything that is left.I have thus far been abusing ui-tests for the types of tests which require presence of the database. Adding a set of unit style tests which have a database present might be a good idea. CalebLudovicCalebLudovic Le 18/10/10 19:55, Caleb James DeLisle a écrit :I talked with the Hibernate people about using streams and was told that it is not supported by all databases. As an alternative to the proposal below I would like to propose a filesystem based storage mechanism. The main advantage of using the database to store everything is that administrators need only use mysql_dump and they have their entire wiki backed up. If we are to abandon that requirement, we can have much faster attachment storage by using the filesystem. For this, I propose BinaryStore interface remains the same but com.xpn.xwiki.doc.BinaryObject would contain: void addContent(InputStream content) OutputStream addContent() void clear() InputStream getContent() void getContent(OutputStream writeTo) clear() would clear the underlying file whereas addContent would always append to it. The added column would look like this: <class name="com.xpn.xwiki.store.doc.FilesystemBinaryObject" table="filesystembinaryobject"> <id name="id" column="id"> <generator class="native" /> </id> <property name="fileURI" type="string"> <column name="fileuri" length="255" not-null="true"/> </property> </class> This would as with the original proposal be useful for not only storing attachments but attachment history, deleted attachments and even document history or deleted documents. WDYT? Caleb On 10/15/2010 04:21 PM, Caleb James DeLisle wrote:Because the storage of large attachments is limited by database constraints and the fact that the JDBC does not allow us to stream content out of the database, I propose we add a new database table binarychunk. The mapping will read as follows: <class name="com.xpn.xwiki.store.hibernate.HibernateBinaryStore$BinaryChunk" table="binarychunk"> <composite-id unsaved-value="undefined"> <key-property name="id" column="id" type="integer" /> <key-property name="chunkNumber" column="chunknumber" type="integer" /> </composite-id> <property name="content" type="binary"> <column name="content" length="983040" not-null="true"/> </property> </class> Notice the maximum length (983040 bytes) is a number which is divisible by many common buffer sizes and is slightly less than the default max_packet_size in mysql which means that using the xwikibinary table, we could store attachments of arbitrary size without hitting mysql default limits. com.xpn.xwiki.store.BinaryStore will contain: @param toLoad a binary object with an id number set, will be loaded. void loadObject(BinaryObject toLoad) @param toStore a binary object, if no id is present then it will be given one upon successful store, if id is present then that id number will be used. void storeObject(BinaryObject toStore) This will be implemented by: com.xpn.xwiki.store.hibernate.HibernateBinaryStore com.xpn.xwiki.doc.BinaryObject will contain: void setContent(InputStream content) OutputStream setContent() InputStream getContent() void getContent(OutputStream writeTo) Note: The get function and set functions will be duplicated with input or output streams to maximize ease of use. This will be implemented by com.xpn.xwiki.doc.TempFileBinaryObject which will store the binary content in a temporary FileItem (see Apache commons fileupload). + This will be able to provide a back end for not only attachment content, but for attachment archive and document archive if it is so desired. + I have no intent of exposing it as public API at the moment. WDYT? Caleb _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
-- Ludovic Dubost Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost
<<attachment: ludovic.vcf>>
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

