Ok, 5 +1 including myself, no 0, no -1 Merging it now, thanks for the review.
Caleb On 08/09/2012 09:54 PM, Sergiu Dumitriu wrote: > On 08/09/2012 04:28 AM, Caleb James DeLisle wrote: >> My changes are in a the feature-attachment-outputstream branch >> https://github.com/xwiki/xwiki-platform/commits/feature-attachment-outputstream >> >> IMO the Hibernate blob handling is never really going to work in a >> cross-platform way. >> Even if that is not true, it absolutely won't work with Cassandra. >> >> For loading an attachment piecewise, OutputStreams are not the only possible >> solution >> but they are (in my analysis) the best. >> >> If we hold off on adding the API because there is no consumer in platform >> proper, I'll >> have to code around it and find another (inferior) solution and if piecewise >> attachment >> loading ever does reach platform proper, everything will be developed and >> debugged around >> inferior solution. >> >> That's my case for it, let me know if you have any specific concerns or if >> your veto is on the general principle. >> I want to get an idea of whether this should be cleaned up or killed quickly. >> >> Thanks, >> >> Caleb >> > > After discussing with Caleb on IRC [1], I've come to the conclusion that > while the OutputStream method isn't mandatory, it would make things easier > for the NoSQL engines which don't actually have data streaming at their core. > So, working with byte array chunks is going to be a (the?) way of supporting > large attachments. And gathering a sequence of BlobChunks and serializing > them into an attachment works better with an OutputStream where they can be > written. > > So, I'm re-changing my vote to +1 for the getContentOutputStream(). > > [1] http://dev.xwiki.org/xwiki/bin/view/IRC/xwikiArchive20120810 > >> >> On 08/07/2012 11:31 PM, Caleb James DeLisle wrote: >>> >>> >>> On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote: >>>> On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote: >>>>> On 08/07/2012 03:43 PM, Denis Gervalle wrote: >>>>>> From the current interface, I would use getContentOutputStream(), since >>>>>> this would be the opposite of getContentInputStream(). This seems to me >>>>>> very descriptive compare to a setContent() returning an OutputStream, >>>>>> since >>>>>> a set is not supposed to return anything. I would use >>>>>> setContent(OutputStream) if it was your goal, but this probably not what >>>>>> you expect. >>>>>> >>>>>> So, +1 for getContentOutputStream() >>>>> >>>>> +1 as well. >>>> >>>> Actually, one thing I don't like is that the API is getting even bigger, >>>> with lots of different ways of doing the same thing. Ideally, there should >>>> be only one way of providing the content, and only one way of retrieving >>>> it. And IMO working with byte streams is the right way, and I think that >>>> the current setContent(InputStream) method is the one that's best. >>>> >>>> Returning an OutputStream where the caller can write the content is >>>> opening the door to lots of potential problems. Providing such a method >>>> becomes API, and APIs should be well designed. And an OutputStream in my >>>> head means some implicit constraints that must be well documented: >>>> >>> >>> I can answer for the implementation which I wrote as a result of this >>> discussion. >>> >>>> - should the stream be closed at the end? >>> >>> Yes, if it is not closed then the content does not become "live". >>> >>>> - must the response be written before a transaction ends? >>> >>> It doesn't matter when it is written since it's just dealing with FileItems. >>> >>>> - what if nothing is written in the stream, does that mean that the old >>>> content is preserved, or that the attachment is going to be truncated? >>> >>> The attachment becomes is made empty. It is legal to have a 0 byte file. >>> >>>> >>>> Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since >>>> the backend will get the data when it needs it, it doesn't have to >>>> document how long it's going to wait for something to be pushed. >>>> >>>> >>>> I agree that our current way of dealing with Hibernate is not right. We >>>> should use proper blob streams for big data, like attachment content. And >>>> Hibernate's LobCreator expects a Blob object that offers access to its >>>> content via an InputStream that can be read. Again, Hibernate expects an >>>> input stream from which it will read when it needs to, it doesn't allow >>>> you to write the content into an OutputStream when you want. >>> >>> There are a few issues here. When I investigated this, Hibernate's blob >>> handling was (still is?) just a wrapper around blob management in the >>> database which basically doesn't work. >>> Oracle has a proprietary blob storage method which works with streams, I >>> think Hibernate does not support this though. >>> As I remember, mysql and postgres have blob APIs but they buffer everything >>> in ram making it nonsensical. >>> This was investigated to solve attachment memory consumption issues and it >>> was concluded that it does not work, leading to FS attachment store. >>> >>> A solution which will work is to piece out the attachment and store each >>> piece separately, flushing the PersistenceManager or Session between >>> save/load operations. >>> they can be run in discrete transactions as long as the metadata contains a >>> pointer to the pieces and the old pieces are not removed until after all of >>> the new pieces are saved. >>> On the load, you are loading a list of attachment pieces one by one and >>> their content must be written out before it is removed from memory. >>> >>> The options for making this work are: >>> >>> * copy the content into a temp file then copy it again the InputStream from >>> that file. This double buffers the content. >>> * use PipedOutputStream to bounce it between two threads. (which I am doing >>> now but need to debug a problem with it.) >>> * extend XWikiAttachmentContent for this use case. >>> * add a method to XWikiAttachmentContent which allows for writing to it >>> using an OutputStream. >>> >>> Since this pattern would be applicable to Hibernate and the technology >>> which came from Cassandra attachments >>> could be ported into the main trunk, I wanted to do the latter. >>> If you are not swayed then I will return to evaluating the former methods. >>> >>> Caleb >>> >>> >>>> >>>> >>>> So, consider this a -1 until you can convince me that it's indeed >>>> something we want to include in our APIs. >>>> >>>>>> +0 for getOutputStream() >>>>>> -0 for other previous proposals. >>>>>> >>>>>> On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> getOutputStream() is not very descriptive although I suppose a good >>>>>>> javadoc comment would alleviate >>>>>>> the issue, I wrestled with the name myself and settled on setContent() >>>>>>> because it overloads the existing setContent() so it should be a bit >>>>>>> easier to remember. >>>>>>> >>>>>>> If you guys like getOutputStream(), I'm happy enough with it. >>>>>>> >>>>>>> Caleb >>>>>>> >>>>>>> >>>>>>> On 08/07/2012 03:24 AM, Thomas Mortagne wrote: >>>>>>>> +1 for the idea in general but same comment than Marius >>>>>>>> >>>>>>>> On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea >>>>>>>> <[email protected]> wrote: >>>>>>>>> I understand the need and I'm +1 but I don't like the method name >>>>>>>>> (neither setContent() nor addContent()). WDYT about getOutputStream() >>>>>>>>> ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Marius >>>>>>>>> >>>>>>>>> On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Hi, >>>>>>>>>> In the development of Cassandra attachments, I found I want to >>>>>>>>>> load an >>>>>>> attachment one chunk at a time and >>>>>>>>>> write that chunk to a provided OutputStream, this is how I envision >>>>>>> next generation Hibernate attachments working too. >>>>>>>>>> >>>>>>>>>> I would like to add to XWikiAttachmentContent: >>>>>>>>>> >>>>>>>>>> public OutputStream addContent(); >>>>>>>>>> >>>>>>>>>> which returns an OutputStream that allows writing the attachment >>>>>>> content and upon close, >>>>>>>>>> sets the attachment content as dirty and resets the size field. >>>>>>>>>> >>>>>>>>>> WDYT? >>>>>>>>>> >>>>>>>>>> Caleb >>>> > > _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

