[
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830528#comment-17830528
]
ASF GitHub Bot commented on HADOOP-19098:
-----------------------------------------
steveloughran commented on code in PR #6604:
URL: https://github.com/apache/hadoop/pull/6604#discussion_r1537742811
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:
##########
@@ -459,51 +459,119 @@ The position returned by `getPos()` after
`readVectored()` is undefined.
If a file is changed while the `readVectored()` operation is in progress, the
output is
undefined. Some ranges may have old data, some may have new, and some may have
both.
-While a `readVectored()` operation is in progress, normal read api calls may
block.
-
-Note: Don't use direct buffers for reading from ChecksumFileSystem as that may
-lead to memory fragmentation explained in HADOOP-18296.
+While a `readVectored()` operation is in progress, normal read API calls MAY
block;
+the value of `getPos(`) is also undefined. Applications SHOULD NOT make such
requests
+while waiting for the results of a vectored read.
+Note: Don't use direct buffers for reading from `ChecksumFileSystem` as that
may
+lead to memory fragmentation explained in
+[HADOOP-18296](https://issues.apache.org/jira/browse/HADOOP-18296)
+_Memory fragmentation in ChecksumFileSystem Vectored IO implementation_
#### Preconditions
-For each requested range:
+No empty lists.
+
+```python
+if ranges = null raise NullPointerException
+if ranges.len() = 0 raise IllegalArgumentException
+if allocate = null raise NullPointerException
+```
+
+For each requested range `range[i]` in the list of ranges `range[0..n]` sorted
+on `getOffset()` ascending such that
+
+for all `i where i > 0`:
- range.getOffset >= 0 else raise IllegalArgumentException
- range.getLength >= 0 else raise EOFException
+ range[i].getOffset() > range[i-1].getOffset()
+
+For all ranges `0..i` the preconditions are:
+
+```python
+ranges[i] != null else raise IllegalArgumentException
+ranges[i].getOffset() >= 0 else raise EOFException
+ranges[i].getLength() >= 0 else raise IllegalArgumentException
+if i > 0 and ranges[i].getOffset() < (ranges[i-1].getOffset() +
ranges[i-1].getLength) :
+ raise IllegalArgumentException
+```
+If the length of the file is known during the validation phase:
+
+```python
+if range[i].getOffset + range[i].getLength >= data.length() raise EOFException
+```
#### Postconditions
-For each requested range:
+For each requested range `range[i]` in the list of ranges `range[0..n]`
+
+```
+ranges[i]'.getData() = CompletableFuture<buffer: ByteBuffer>
+```
- range.getData() returns CompletableFuture<ByteBuffer> which will have data
- from range.getOffset to range.getLength.
+ and when `getData().get()` completes:
+```
+let buffer = `getData().get()
+let len = ranges[i].getLength()
+let data = new byte[len]
+(buffer.position() - buffer.limit) = len
+buffer.get(data, 0, len) = readFully(ranges[i].getOffset(), data, 0, len)
+```
-### `minSeekForVectorReads()`
+That is: the result of every ranged read is the result of the (possibly
asynchronous)
+call to `PositionedReadable.readFully()` for the same offset and length
+
+#### `minSeekForVectorReads()`
The smallest reasonable seek. Two ranges won't be merged together if the
difference between
end of first and start of next range is more than this value.
-### `maxReadSizeForVectorReads()`
+#### `maxReadSizeForVectorReads()`
Maximum number of bytes which can be read in one go after merging the ranges.
-Two ranges won't be merged if the combined data to be read is more than this
value.
+Two ranges won't be merged if the combined data to be read It's okay we have a
look at what we do right now for readOkayis more than this value.
Review Comment:
oh, speech input, must have been from a conf call.
> Vector IO: consistent specified rejection of overlapping ranges
> ---------------------------------------------------------------
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/s3
> Affects Versions: 3.3.6
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say
> * "overlap triggers IllegalArgumentException".
> * special case: 0 byte ranges may be short circuited to return empty buffer
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older
> releases
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]