steveloughran edited a comment on pull request #2584:
URL: https://github.com/apache/hadoop/pull/2584#issuecomment-759582014
I'm thinking we should be more ambitious in read policy than just "fadvise",
because we can then use it as a declaration for the input streams to tune all
their params, eg. buffer sizing, whether to do async prefetch.
Then we could allow stores to support not-just seek policies, but declare
what you were planning to read, e.g. "parquet-bytebuffer", to mean "I'm reading
parquet files through the bytebuffer positioned read API"
```
openFile("s3a://datasets/set1/input.parquet).
opt("fs.openfile.policy, "parquet-vectored, impala, parquet,random")
.build().get()
```
example` opt(fs.openfile.read.policy, "parquet-vectored, parquet, random")`
to mean "optimise for impala for vectored IO, then generic vectored IO, then
generic random IO". Store implementors would get to make their own decisions as
to what to set based on profiling &c. We'd need the applications to set policy
on `openFile()` -so would need to know what names to use. That we can discuss
with them, maybe by predefining some options which *may* be supported
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]