Thanks everyone for the discussion and suggestions!

I did some digging and learned how to use PipedStreams and this seems to be working well for me.

I, too, am concerned about the reliability problems with S3, but experience over the past year and a half is that the failure rate is less than 1% and has been improving. My users seem to be satisfied with an occasional error. If I get an S3 failure, I return an error to the client. My concern about buffering whole files is the performance and memory requirements that would limit the number of users that could be supported.

Thanks for the help!
Charles


Brad McEvoy wrote:

hey,

just one experience i've had with s3 which might impact on the design.
i've found that uploads into s3 (when under load) fail surprisingly
frequently, at a rate of about 1 in a hundred. the s3 docs say that
upload failures are a normal and an expected part of the service, but i
didnt really think it would fail that often.

so my conclusion is that i have to buffer uploads and then load into s3
asynchronously to ensure that end users don't get impacted. if you do
this then you end up doing two seperate file transfers anyway, so
there's no big deal about going inputstream to inputstream.

another consideration is that uploads to s3 are non-transactional. so if
you stream directly from end user to s3, and the end user aborts the
upload, then you'll end up with a completely useless part of a file
stored in s3. if you buffer locally and only load into s3 on success you
won't have this problem.

cheers,
brad

On Wed, 14 Oct 2009 10:28 +0200, "David Latorre" <[email protected]>
wrote:
2009/10/13 Niklas Gustavsson <[email protected]>:
On Tue, Oct 13, 2009 at 6:21 AM, Charles Karow <[email protected]> wrote:
I am using ftpserver to provide a standard way for people to upload files to
a "bucket" on Amazon's S3 service. My users will always be uploading files
in binary mode. I am using code from Amazon that takes an InputStream and
uses it to stream the data to Amazon's servers. Amazon's code does not
expose an OutputStream.

transferFromClient takes an OutputStream and I do not have access to an
OutputStream.
Sounds like this could be solved by an adapter stream which gets
written to by DataConnection and is read by S3. Or I might be missing
something?
This is I what I first thought but I think this might imply  several
risks in terms of performance, or the need to store the whole
transferred file locally (be it in memory or disk). This is of course,
if he cannot use PipedStreams.

I may not be thinking correctly now but for a solution:

-  If you are using different threads for the FTP transfer and the
transfer to Amazon I guess you could use PipedStreams with our current
code (I haven't looked at it actually).

- Otherwise, maybe someone on this list can tell us what their
approach is. I think some of them are using S3.

If no one comes up with a solution for this, i don't think we should
dismiss the possibility of exposing the input stream,  what do you
think niklas?

Reply via email to