On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote:
On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote:
On 08/07/2012 03:43 PM, Denis Gervalle wrote:
From the current interface, I would use getContentOutputStream(), since
this would be the opposite of getContentInputStream(). This seems to me
very descriptive compare to a setContent() returning an OutputStream,
since
a set is not supposed to return anything. I would use
setContent(OutputStream) if it was your goal, but this probably not what
you expect.
So, +1 for getContentOutputStream()
+1 as well.
Actually, one thing I don't like is that the API is getting even bigger, with
lots of different ways of doing the same thing. Ideally, there should be only
one way of providing the content, and only one way of retrieving it. And IMO
working with byte streams is the right way, and I think that the current
setContent(InputStream) method is the one that's best.
Returning an OutputStream where the caller can write the content is opening the
door to lots of potential problems. Providing such a method becomes API, and
APIs should be well designed. And an OutputStream in my head means some
implicit constraints that must be well documented:
I can answer for the implementation which I wrote as a result of this
discussion.
- should the stream be closed at the end?
Yes, if it is not closed then the content does not become "live".
- must the response be written before a transaction ends?
It doesn't matter when it is written since it's just dealing with FileItems.
- what if nothing is written in the stream, does that mean that the old content
is preserved, or that the attachment is going to be truncated?
The attachment becomes is made empty. It is legal to have a 0 byte file.
Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since the
backend will get the data when it needs it, it doesn't have to document how
long it's going to wait for something to be pushed.
I agree that our current way of dealing with Hibernate is not right. We should
use proper blob streams for big data, like attachment content. And Hibernate's
LobCreator expects a Blob object that offers access to its content via an
InputStream that can be read. Again, Hibernate expects an input stream from
which it will read when it needs to, it doesn't allow you to write the content
into an OutputStream when you want.
There are a few issues here. When I investigated this, Hibernate's blob
handling was (still is?) just a wrapper around blob management in the database
which basically doesn't work.
Oracle has a proprietary blob storage method which works with streams, I think
Hibernate does not support this though.
As I remember, mysql and postgres have blob APIs but they buffer everything in
ram making it nonsensical.
This was investigated to solve attachment memory consumption issues and it was
concluded that it does not work, leading to FS attachment store.
A solution which will work is to piece out the attachment and store each piece
separately, flushing the PersistenceManager or Session between save/load
operations.
they can be run in discrete transactions as long as the metadata contains a
pointer to the pieces and the old pieces are not removed until after all of the
new pieces are saved.
On the load, you are loading a list of attachment pieces one by one and their
content must be written out before it is removed from memory.
The options for making this work are:
* copy the content into a temp file then copy it again the InputStream from
that file. This double buffers the content.
* use PipedOutputStream to bounce it between two threads. (which I am doing now
but need to debug a problem with it.)
* extend XWikiAttachmentContent for this use case.
* add a method to XWikiAttachmentContent which allows for writing to it using
an OutputStream.
Since this pattern would be applicable to Hibernate and the technology which
came from Cassandra attachments
could be ported into the main trunk, I wanted to do the latter.
If you are not swayed then I will return to evaluating the former methods.
Caleb
So, consider this a -1 until you can convince me that it's indeed something we
want to include in our APIs.
+0 for getOutputStream()
-0 for other previous proposals.
On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle <
[email protected]> wrote:
getOutputStream() is not very descriptive although I suppose a good
javadoc comment would alleviate
the issue, I wrestled with the name myself and settled on setContent()
because it overloads the existing setContent() so it should be a bit
easier to remember.
If you guys like getOutputStream(), I'm happy enough with it.
Caleb
On 08/07/2012 03:24 AM, Thomas Mortagne wrote:
+1 for the idea in general but same comment than Marius
On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea
<[email protected]> wrote:
I understand the need and I'm +1 but I don't like the method name
(neither setContent() nor addContent()). WDYT about getOutputStream()
?
Thanks,
Marius
On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle
<[email protected]> wrote:
Hi,
In the development of Cassandra attachments, I found I want to
load an
attachment one chunk at a time and
write that chunk to a provided OutputStream, this is how I envision
next generation Hibernate attachments working too.
I would like to add to XWikiAttachmentContent:
public OutputStream addContent();
which returns an OutputStream that allows writing the attachment
content and upon close,
sets the attachment content as dirty and resets the size field.
WDYT?
Caleb