[
https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581658#comment-16581658
]
Steve Loughran commented on SPARK-24787:
----------------------------------------
yes,, hsync updating the file length is the problem; that is:
{code}
DFSOutputStream.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
{code}
will talk to the NN; without that a normal hsync save the data, but will only
update the NN when a block is completed.
It's a PITA but the other apps which look for changed files (e.g. YARN ATS)
have an algorithm of caching the previous length of the file, re-opening it,
trying to seek() to EOF + 1, and if that and or a subsequent read() succeeds,
inferring that the file has changed.
See my underreviewed/uncommitted attempt at specifying & documenting this from
HADOOP-13327
[outputstream.md|https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md]
Note some details there on when hsync doesn't, that is: if enough data has been
written that you've crossed a block boundary; only the current active block is
synced. IMO: bad behaviour.
> Events being dropped at an alarming rate due to hsync being slow for
> eventLogging
> ---------------------------------------------------------------------------------
>
> Key: SPARK-24787
> URL: https://issues.apache.org/jira/browse/SPARK-24787
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Web UI
> Affects Versions: 2.3.0, 2.3.1
> Reporter: Sanket Reddy
> Priority: Minor
>
> [https://github.com/apache/spark/pull/16924/files] updates the length of the
> inprogress files allowing history server being responsive.
> Although we have a production job that has 60000 tasks per stage and due to
> hsync being slow it starts dropping events and the history server has wrong
> stats due to events being dropped.
> A viable solution is not to make it sync very frequently or make it
> configurable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]