Ok, 5 +1 including myself, no 0, no -1
Merging it now, thanks for the review.

Caleb

On 08/09/2012 09:54 PM, Sergiu Dumitriu wrote:
> On 08/09/2012 04:28 AM, Caleb James DeLisle wrote:
>> My changes are in a the feature-attachment-outputstream branch
>> https://github.com/xwiki/xwiki-platform/commits/feature-attachment-outputstream
>>
>> IMO the Hibernate blob handling is never really going to work in a 
>> cross-platform way.
>> Even if that is not true, it absolutely won't work with Cassandra.
>>
>> For loading an attachment piecewise, OutputStreams are not the only possible 
>> solution
>> but they are (in my analysis) the best.
>>
>> If we hold off on adding the API because there is no consumer in platform 
>> proper, I'll
>> have to code around it and find another (inferior) solution and if piecewise 
>> attachment
>> loading ever does reach platform proper, everything will be developed and 
>> debugged around
>> inferior solution.
>>
>> That's my case for it, let me know if you have any specific concerns or if 
>> your veto is on the general principle.
>> I want to get an idea of whether this should be cleaned up or killed quickly.
>>
>> Thanks,
>>
>> Caleb
>>
> 
> After discussing with Caleb on IRC [1], I've come to the conclusion that 
> while the OutputStream method isn't mandatory, it would make things easier 
> for the NoSQL engines which don't actually have data streaming at their core. 
> So, working with byte array chunks is going to be a (the?) way of supporting 
> large attachments. And gathering a sequence of BlobChunks and serializing 
> them into an attachment works better with an OutputStream where they can be 
> written.
> 
> So, I'm re-changing my vote to +1 for the getContentOutputStream().
> 
> [1] http://dev.xwiki.org/xwiki/bin/view/IRC/xwikiArchive20120810
> 
>>
>> On 08/07/2012 11:31 PM, Caleb James DeLisle wrote:
>>>
>>>
>>> On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote:
>>>> On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote:
>>>>> On 08/07/2012 03:43 PM, Denis Gervalle wrote:
>>>>>>   From the current interface, I would use getContentOutputStream(), since
>>>>>> this would be the opposite of getContentInputStream(). This seems to me
>>>>>> very descriptive compare to a setContent() returning an OutputStream,
>>>>>> since
>>>>>> a set is not supposed to return anything. I would use
>>>>>> setContent(OutputStream) if it was your goal, but this probably not what
>>>>>> you expect.
>>>>>>
>>>>>> So, +1 for  getContentOutputStream()
>>>>>
>>>>> +1 as well.
>>>>
>>>> Actually, one thing I don't like is that the API is getting even bigger, 
>>>> with lots of different ways of doing the same thing. Ideally, there should 
>>>> be only one way of providing the content, and only one way of retrieving 
>>>> it. And IMO working with byte streams is the right way, and I think that 
>>>> the current setContent(InputStream) method is the one that's best.
>>>>
>>>> Returning an OutputStream where the caller can write the content is 
>>>> opening the door to lots of potential problems. Providing such a method 
>>>> becomes API, and APIs should be well designed. And an OutputStream in my 
>>>> head means some implicit constraints that must be well documented:
>>>>
>>>
>>> I can answer for the implementation which I wrote as a result of this 
>>> discussion.
>>>
>>>> - should the stream be closed at the end?
>>>
>>> Yes, if it is not closed then the content does not become "live".
>>>
>>>> - must the response be written before a transaction ends?
>>>
>>> It doesn't matter when it is written since it's just dealing with FileItems.
>>>
>>>> - what if nothing is written in the stream, does that mean that the old 
>>>> content is preserved, or that the attachment is going to be truncated?
>>>
>>> The attachment becomes is made empty. It is legal to have a 0 byte file.
>>>
>>>>
>>>> Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since 
>>>> the backend will get the data when it needs it, it doesn't have to 
>>>> document how long it's going to wait for something to be pushed.
>>>>
>>>>
>>>> I agree that our current way of dealing with Hibernate is not right. We 
>>>> should use proper blob streams for big data, like attachment content. And 
>>>> Hibernate's LobCreator expects a Blob object that offers access to its 
>>>> content via an InputStream that can be read. Again, Hibernate expects an 
>>>> input stream from which it will read when it needs to, it doesn't allow 
>>>> you to write the content into an OutputStream when you want.
>>>
>>> There are a few issues here. When I investigated this, Hibernate's blob 
>>> handling was (still is?) just a wrapper around blob management in the 
>>> database which basically doesn't work.
>>> Oracle has a proprietary blob storage method which works with streams, I 
>>> think Hibernate does not support this though.
>>> As I remember, mysql and postgres have blob APIs but they buffer everything 
>>> in ram making it nonsensical.
>>> This was investigated to solve attachment memory consumption issues and it 
>>> was concluded that it does not work, leading to FS attachment store.
>>>
>>> A solution which will work is to piece out the attachment and store each 
>>> piece separately, flushing the PersistenceManager or Session between 
>>> save/load operations.
>>> they can be run in discrete transactions as long as the metadata contains a 
>>> pointer to the pieces and the old pieces are not removed until after all of 
>>> the new pieces are saved.
>>> On the load, you are loading a list of attachment pieces one by one and 
>>> their content must be written out before it is removed from memory.
>>>
>>> The options for making this work are:
>>>
>>> * copy the content into a temp file then copy it again the InputStream from 
>>> that file. This double buffers the content.
>>> * use PipedOutputStream to bounce it between two threads. (which I am doing 
>>> now but need to debug a problem with it.)
>>> * extend XWikiAttachmentContent for this use case.
>>> * add a method to XWikiAttachmentContent which allows for writing to it 
>>> using an OutputStream.
>>>
>>> Since this pattern would be applicable to Hibernate and the technology 
>>> which came from Cassandra attachments
>>> could be ported into the main trunk, I wanted to do the latter.
>>> If you are not swayed then I will return to evaluating the former methods.
>>>
>>> Caleb
>>>
>>>
>>>>
>>>>
>>>> So, consider this a -1 until you can convince me that it's indeed 
>>>> something we want to include in our APIs.
>>>>
>>>>>> +0 for getOutputStream()
>>>>>> -0 for other previous proposals.
>>>>>>
>>>>>> On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> getOutputStream() is not very descriptive although I suppose a good
>>>>>>> javadoc comment would alleviate
>>>>>>> the issue, I wrestled with the name myself and settled on setContent()
>>>>>>> because it overloads the existing setContent() so it should be a bit
>>>>>>> easier to remember.
>>>>>>>
>>>>>>> If you guys like getOutputStream(), I'm happy enough with it.
>>>>>>>
>>>>>>> Caleb
>>>>>>>
>>>>>>>
>>>>>>> On 08/07/2012 03:24 AM, Thomas Mortagne wrote:
>>>>>>>> +1 for the idea in general but same comment than Marius
>>>>>>>>
>>>>>>>> On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea
>>>>>>>> <[email protected]> wrote:
>>>>>>>>> I understand the need and I'm +1 but I don't like the method name
>>>>>>>>> (neither setContent() nor addContent()). WDYT about getOutputStream()
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Marius
>>>>>>>>>
>>>>>>>>> On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> In the development of Cassandra attachments, I found I want to
>>>>>>>>>> load an
>>>>>>> attachment one chunk at a time and
>>>>>>>>>> write that chunk to a provided OutputStream, this is how I envision
>>>>>>> next generation Hibernate attachments working too.
>>>>>>>>>>
>>>>>>>>>> I would like to add to XWikiAttachmentContent:
>>>>>>>>>>
>>>>>>>>>> public OutputStream addContent();
>>>>>>>>>>
>>>>>>>>>> which returns an OutputStream that allows writing the attachment
>>>>>>> content and upon close,
>>>>>>>>>> sets the attachment content as dirty and resets the size field.
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>> Caleb
>>>>
> 
> 


_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to