[
https://issues.apache.org/jira/browse/HADOOP-13327?focusedWorklogId=550274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-550274
]
ASF GitHub Bot logged work on HADOOP-13327:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Feb/21 15:01
Start Date: 09/Feb/21 15:01
Worklog Time Spent: 10m
Work Description: steveloughran commented on a change in pull request
#2587:
URL: https://github.com/apache/hadoop/pull/2587#discussion_r572958871
##########
File path:
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md
##########
@@ -0,0 +1,1002 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+<!-- MACRO{toc|fromDepth=1|toDepth=3} -->
+
+# Output: `OutputStream`, `Syncable` and `StreamCapabilities`
+
+## Introduction
+
+This document covers the Output Streams within the context of the
+[Hadoop File System Specification](index.html).
+
+It uses the filesystem model defined in [A Model of a Hadoop
Filesystem](model.html)
+with the notation defined in [notation](Notation.md).
+
+The target audiences are:
+1. Users of the APIs. While `java.io.OutputStream` is a standard interfaces,
+this document clarifies how it is implemented in HDFS and elsewhere.
+The Hadoop-specific interfaces `Syncable` and `StreamCapabilities` are new;
+`Syncable` is notable in offering durability and visibility guarantees which
+exceed that of `OutputStream`.
+1. Implementors of File Systems and clients.
+
+## How data is written to a filesystem
+
+The core mechanism to write data to files through the Hadoop FileSystem APIs
+is through `OutputStream` subclasses obtained through calls to
+`FileSystem.create()`, `FileSystem.append()`,
+or `FSDataOutputStreamBuilder.build()`.
+
+These all return instances of `FSDataOutputStream`, through which data
+can be written through various `write()` methods.
+After a stream's `close()` method is called, all data written to the
+stream MUST BE persisted to the fileysystem and visible to oll other
+clients attempting to read data from that path via `FileSystem.open()`.
+
+As well as operations to write the data, Hadoop's `OutputStream`
implementations
+provide methods to flush buffered data back to the filesystem,
+so as to ensure that the data is reliably persisted and/or visible
+to other callers. This is done via the `Syncable` interface. It was
+originally intended that the presence of this interface could be interpreted
+as a guarantee that the stream supported its methods. However, this has proven
+impossible to guarantee as the static nature of the interface is incompatible
+with filesystems whose syncability semantics may vary on a store/path basis.
+As an example, erasure coded files in HDFS do not support the Sync operations,
+even though they are implemented as subclass of an output stream which is
`Syncable`.
+
+A new interface: `StreamCapabilities`. This allows callers
+to probe the exact capabilities of a stream, even transitively
+through a chain of streams.
+
+## Output Stream Model
+
+For this specification, an output stream can be viewed as a list of bytes
+stored in the client -the `hsync()` and `hflush()` operations the actions
+which propagate the data to be visible to other readers of the file and/or
+made durable.
+
+```python
+buffer: List[byte]
+```
+
+A flag, `open` tracks whether the stream is open: after the stream
+is closed no more data may be written to it:
+
+```python
+open: bool
+buffer: List[byte]
+```
+
+The destination path of the stream, `path`, can be tracked to form a triple
+`path, open, buffer`
+
+```python
+Stream = (path: Path, open: Boolean, buffer: byte[])
+```
+
+#### Visibility of Flushed Data
+
+(Immediately) after `Syncable` operations which flush data to the filesystem,
+the data at the stream's destination path MUST match that of
+`buffer`. That is, the following condition MUST hold:
+
+```python
+FS'.Files(path) == buffer
+```
+
+Any client reading the data at the path MUST see the new data.
+The `Syncable` operations differ in their durability
+guarantees, not visibility of data.
+
+### State of Stream and File System after `Filesystem.create()`
+
+The output stream returned by a `FileSystem.create(path)` or
+`FileSystem.createFile(path).build()` within a filesystem `FS`,
+can be modeled as a triple containing an empty array of no data:
+
+```python
+Stream' = (path, true, [])
+```
+
+The filesystem `FS'` MUST contain a 0-byte file at the path:
+
+```python
+FS' = FS where data(FS', path) == []
+```
+
+Thus, the initial state of `Stream'.buffer` is implicitly
+consistent with the data at the filesystem.
+
+
+*Object Stores*: see caveats in the "Object Stores" section below.
+
+### State of Stream and File System after `Filesystem.append()`
+
+The output stream returned from a call of
+ `FileSystem.append(path, buffersize, progress)` within a filesystem `FS`,
+can be modelled as a stream whose `buffer` is intialized to that of
+the original file:
+
+```python
+Stream' = (path, true, data(FS, path))
+```
+
+#### Persisting data
+
+When the stream writes data back to its store, be it in any
+supported flush operation, in the `close()` operation, or at any other
+time the stream chooses to do so, the contents of the file
+are replaced with the current buffer
+
+```python
+Stream' = (path, true, buffer)
+FS' = FS where data(FS', path) == buffer
+```
+
+After a call to `close()`, the stream is closed for all operations other
+than `close()`; they MAY fail with `IOException` or `RuntimeException`.
+
+```python
+Stream' = (path, false, [])
+```
+
+The `close()` operation MUST be idempotent with the sole attempt to write the
+data made in the first invocation.
+
+1. If `close()` succeeds, subsequent calls are no-ops.
+1. If `close()` fails, again, subsequent calls are no-ops. They MAY rethrow
+the previous exception, but they MUST NOT retry the write.
+
+<!-- ============================================================= -->
+<!-- CLASS: FSDataOutputStream -->
+<!-- ============================================================= -->
+
+## <a name="fsdataoutputstream"></a>Class `FSDataOutputStream`
+
+```java
+public class FSDataOutputStream
+ extends DataOutputStream
+ implements Syncable, CanSetDropBehind, StreamCapabilities {
+ // ...
+}
+```
+
+The `FileSystem.create()`, `FileSystem.append()` and
+`FSDataOutputStreamBuilder.build()` calls return an instance
+of a class `FSDataOutputStream`, a subclass of `java.io.OutputStream`.
+
+The base class wraps an `OutputStream` instance, one which may implement
`Streamable`,
+`CanSetDropBehind` and `StreamCapabilities`.
+
+This document covers the requirements of such implementations.
+
+HDFS's `FileSystem` implementation, `DistributedFileSystem`, returns an
instance
+of `HdfsDataOutputStream`. This implementation has at least two behaviors
+which are not explicitly declared by the base Java implmentation
+
+1. Writes are synchronized: more than one thread can write to the same
+output stream. This is a use pattern which HBase relies on.
+
+1. `OutputStream.flush()` is a no-op when the file is closed. Apache Druid
+has made such a call on this in the past
+[HADOOP-14346](https://issues.apache.org/jira/browse/HADOOP-14346).
+
+
+As the HDFS implementation is considered the de-facto specification of
+the FileSystem APIs, the fact that `write()` is thread-safe is significant.
+
+For compatibility, not only SHOULD other FS clients be thread-safe,
+but new HDFS features, such as encryption and Erasure Coding SHOULD also
+implement consistent behavior with the core HDFS output stream.
+
+Put differently:
+
+*It isn't enough for Output Streams to implement the core semantics
+of `java.io.OutputStream`: they need to implement the extra semantics
+of `HdfsDataOutputStream`, especially for HBase to work correctly.*
+
+The concurrent `write()` call is the most significant tightening of
+the Java specification.
+
+## <a name="outputstream"></a>Class `java.io.OutputStream`
+
+A Java `OutputStream` allows applications to write a sequence of bytes to a
destination.
+In a Hadoop filesystem, that destination is the data under a path in the
filesystem.
+
+```java
+public abstract class OutputStream implements Closeable, Flushable {
+ public abstract void write(int b) throws IOException;
+ public void write(byte b[]) throws IOException;
+ public void write(byte b[], int off, int len) throws IOException;
+ public void flush() throws IOException;
+ public void close() throws IOException;
+}
+```
+### <a name="write(data: int)"></a>`write(Stream, data)`
+
+Writes a byte of data to the stream.
+
+#### Preconditions
+
+```python
+Stream.open else raise ClosedChannelException, PathIOException, IOException
+```
+
+The exception `java.nio.channels.ClosedChannelExceptionn` is
+raised in the HDFS output streams when trying to write to a closed file.
+This exception does not include the destination path; and
+`Exception.getMessage()` is `null`. It is therefore of limited value in stack
+traces. Implementors may wish to raise exceptions with more detail, such
+as a `PathIOException`.
+
+
+#### Postconditions
+
+The buffer has the lower 8 bits of the data argument appended to it.
+
+```python
+Stream'.buffer = Stream.buffer + [data & 0xff]
+```
+
+There may be an explicit limit on the size of cached data, or an implicit
+limit based by the available capacity of the destination filesystem.
+When a limit is reached, `write()` SHOULD fail with an `IOException`.
+
+### <a name="write(buffer,offset,len)"></a>`write(Stream, byte[] data, int
offset, int len)`
+
+
+#### Preconditions
+
+The preconditions are all defined in `OutputStream.write()`
+
+```python
+Stream.open else raise ClosedChannelException, PathIOException, IOException
+data != null else raise NullPointerException
+offset >= 0 else raise IndexOutOfBoundsException
+len >= 0 else raise IndexOutOfBoundsException
+offset < data.length else raise IndexOutOfBoundsException
+offset + len < data.length else raise IndexOutOfBoundsException
+```
+
+After the operation has returned, the buffer may be re-used. The outcome
+of updates to the buffer while the `write()` operation is in progress is
undefined.
+
+#### Postconditions
+
+```python
+Stream'.buffer = Stream.buffer + data[offset...(offset + len)]
+```
+
+### <a name="write(buffer)"></a>`write(byte[] data)`
+
+This is defined as the equivalent of:
+
+```python
+write(data, 0, data.length)
+```
+
+### <a name="flush()"></a>`flush()`
+
+Requests that the data is flushed. The specification of `ObjectStream.flush()`
+declares that this SHOULD write data to the "intended destination".
+
+It explicitly precludes any guarantees about durability.
+
+For that reason, this document doesn't provide any normative
+specifications of behaviour.
+
+#### Preconditions
+
+None.
+
+#### Postconditions
+
+None.
+
+If the implementation chooses to implement a stream-flushing operation,
+the data may be saved to the file system such that it becomes visible to
+others"
+
+```python
+FS' = FS where data(FS', path) == buffer
+```
+
+When a stream is closed, `flush()` SHOULD downgrade to being a no-op, if it
was not
+one already. This is to work with applications and libraries which can invoke
+it in exactly this way.
+
+
+*Issue*: Should `flush()` forward to `hflush()`?
+
+No. Or at least, make it optional.
+
+There's a lot of application code which assumes that `flush()` is low cost
+and should be invoked after writing every single line of output, after
+writing small 4KB blocks or similar.
+
+Forwarding this to a full flush across a distributed filesystem, or worse,
+a distant object store, is very inefficient.
+Filesystem clients which do uprate a `flush()` to an `hflush()` will eventually
Review comment:
I went with "upgrade" in the end
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 550274)
Time Spent: 10h (was: 9h 50m)
> Add OutputStream + Syncable to the Filesystem Specification
> -----------------------------------------------------------
>
> Key: HADOOP-13327
> URL: https://issues.apache.org/jira/browse/HADOOP-13327
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Attachments: HADOOP-13327-002.patch, HADOOP-13327-003.patch,
> HADOOP-13327-branch-2-001.patch
>
> Time Spent: 10h
> Remaining Estimate: 0h
>
> Write down what a Filesystem output stream should do. While core the API is
> defined in Java, that doesn't say what's expected about visibility,
> durability, etc —and Hadoop Syncable interface is entirely ours to define.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]