[
https://issues.apache.org/jira/browse/HDDS-12895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-12895:
-------------------------------
Description:
We need to see whether we should enable O_DIRECT (ExtendedOptions.DIRECT) in
datanodes reads and writes. It has been supported since JDK 10
([https://bugs.openjdk.org/browse/JDK-8164900])
Resources
*
[https://events19.linuxfoundation.org/wp-content/uploads/2017/11/Accelerating-IO-in-Big-Data-%E2%80%93-A-Data-Driven-Approach-and-Case-Studies-Yingqi-Lucy-Lu-Intel-Corporation.pdf]
* https://github.com/facebook/rocksdb/wiki/Direct-IO
In some datanodes that is colocated with compute engines (e.g. Yarn / Spark /
Presto), we want the DN to NOT use file system cache since it can affect the
colocated machines.
However, there should be expected performance degradations since writes are not
buffered and reads are not cached. In that case, we can implement our own data
cache instead of relying on OS cache. For example, we already have
ContainerStateMachine.stateMachineDataCache that will store the pending
replicated write chunk.
was:
We need to see whether we should enable O_DIRECT (ExtendedOptions.DIRECT) in
datanodes reads and writes. It has been supported since JDK 10
([https://bugs.openjdk.org/browse/JDK-8164900])
Resources
*
[https://events19.linuxfoundation.org/wp-content/uploads/2017/11/Accelerating-IO-in-Big-Data-%E2%80%93-A-Data-Driven-Approach-and-Case-Studies-Yingqi-Lucy-Lu-Intel-Corporation.pdf]
* https://github.com/facebook/rocksdb/wiki/Direct-IO
In some datanodes that is colocated with compute engines (e.g. Yarn / Spark /
Presto), we want the DN to NOT use file system cache since it can affect the
colocated machines.
However, there should be expected performance degradations since writes are not
buffered and reads are not cached. In that case, we can implement our own data
cache instead of relying on OS cache. For example, we already have
ContainerStateMachine.stateMachineDataCache that will store the pending written
write chunk.
> Explore O_DIRECT in Datanodes
> -----------------------------
>
> Key: HDDS-12895
> URL: https://issues.apache.org/jira/browse/HDDS-12895
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> We need to see whether we should enable O_DIRECT (ExtendedOptions.DIRECT) in
> datanodes reads and writes. It has been supported since JDK 10
> ([https://bugs.openjdk.org/browse/JDK-8164900])
> Resources
> *
> [https://events19.linuxfoundation.org/wp-content/uploads/2017/11/Accelerating-IO-in-Big-Data-%E2%80%93-A-Data-Driven-Approach-and-Case-Studies-Yingqi-Lucy-Lu-Intel-Corporation.pdf]
> * https://github.com/facebook/rocksdb/wiki/Direct-IO
> In some datanodes that is colocated with compute engines (e.g. Yarn / Spark /
> Presto), we want the DN to NOT use file system cache since it can affect the
> colocated machines.
> However, there should be expected performance degradations since writes are
> not buffered and reads are not cached. In that case, we can implement our own
> data cache instead of relying on OS cache. For example, we already have
> ContainerStateMachine.stateMachineDataCache that will store the pending
> replicated write chunk.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]