On 23 April 2013 09:00, Steve Loughran <ste...@hortonworks.com> wrote:

>
>
> On 22 April 2013 18:32, Eli Collins <e...@cloudera.com> wrote:
>
>>
>>
>
>> However if a change made FileSystem#close three times slower, this
>> perhaps a smaller semantic change (eg doesn't change what exceptions
>> get thrown) but probably much less tolerable for end users.
>>
>
> You know that the blobstores all buffer their data so that
>
>    1. flush() is a no-op
>    2. the write takes place on close()
>
> #1 changes durability expectations, while #2 means the time to close() is
> O(data)*O(latency); P(fail) scales with time and distance, and as lots of
> code swallows exceptions on close, those failures may even miss.
>
>
for the curious, there are some tests that I plan to get into bigtop that
not only generate various large files, they collect stats on the duration
of operations. On a remote blobstore, its close() that takes most of the
time, even for only a few MB of data

2013-04-23 11:23:21,911 [main] INFO  tools.DataGenerator (?:call(?)) -
Generating 100000 lines of data
2013-04-23 11:23:22,122 [main] DEBUG snative.SwiftNativeOutputStream
(SwiftNativeOutputStream.java:uploadOnClose(146)) - Closing write of file
/tmp/data/massive/csv/data-0014.csv;
localfile=target/build/test/output-4786965937321057354.tmp of length 3301583
2013-04-23 11:23:23,437 [main] INFO  generate.GenerateManyCSVFilesTest
(?:call(?)) - Total time = 0:02:031; create time=0:00:505; write time
=0:00:210; close time = 0:01:316 partitions=0

Reply via email to