[
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669112#comment-16669112
]
Xiao Chen commented on HDFS-14038:
----------------------------------
Thanks for the comments and sorry if I wasn't clear.
Yes, the goal of this jira is to investigate a reasonable way for downstream to
construct the stream. We could add 'Spark' to DFS' LimitedPrivate. Or better
yet figure out a way to expose this in a reasonable way on
FSDataOutputStreamBuilder. Currently {{replicate}} is purely an HDFS concept,
that is separate from {{replication}} which sets the replication factor but
only if the file is using replication rather than EC. That said, replication
factor is also only an hdfs concept, so I don't see why we can't move that up.
Need great clarity not to confuse users of course.
> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> ---------------------------------------------------------------------
>
> Key: HDFS-14038
> URL: https://issues.apache.org/jira/browse/HDFS-14038
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Xiao Chen
> Priority: Major
>
> In SPARK-25855 /
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark
> prefer to create Spark event log files with replication (instead of EC). To
> do this currently, it has to be done by some casting / reflection, to get a
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}}
> subclass of it).
> We should officially expose this for Spark's usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]