On 23 April 2013 09:00, Steve Loughran <ste...@hortonworks.com> wrote:
> > > On 22 April 2013 18:32, Eli Collins <e...@cloudera.com> wrote: > >> >> > >> However if a change made FileSystem#close three times slower, this >> perhaps a smaller semantic change (eg doesn't change what exceptions >> get thrown) but probably much less tolerable for end users. >> > > You know that the blobstores all buffer their data so that > > 1. flush() is a no-op > 2. the write takes place on close() > > #1 changes durability expectations, while #2 means the time to close() is > O(data)*O(latency); P(fail) scales with time and distance, and as lots of > code swallows exceptions on close, those failures may even miss. > > for the curious, there are some tests that I plan to get into bigtop that not only generate various large files, they collect stats on the duration of operations. On a remote blobstore, its close() that takes most of the time, even for only a few MB of data 2013-04-23 11:23:21,911 [main] INFO tools.DataGenerator (?:call(?)) - Generating 100000 lines of data 2013-04-23 11:23:22,122 [main] DEBUG snative.SwiftNativeOutputStream (SwiftNativeOutputStream.java:uploadOnClose(146)) - Closing write of file /tmp/data/massive/csv/data-0014.csv; localfile=target/build/test/output-4786965937321057354.tmp of length 3301583 2013-04-23 11:23:23,437 [main] INFO generate.GenerateManyCSVFilesTest (?:call(?)) - Total time = 0:02:031; create time=0:00:505; write time =0:00:210; close time = 0:01:316 partitions=0