[
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669087#comment-16669087
]
Marcelo Vanzin commented on HDFS-14038:
---------------------------------------
No, the problem here is not the use of reflection. That is needed because Spark
still has to build against Hadoop 2, which doesn't have that API.
The issue raised in that comment is that the method Spark uses is in a
LimitedPrivate / Unstable API. Which means it can break at any time.
For example, a better approach would be to have a method in
{{FSDataOutputStreamBuilder}}, which is marked as public. In fact, there's
already {{replication()}}, to set the replication factor, but it doesn't seem
related to the {{replicate()}} method in HdfsDataOutputStreamBuilder. Maybe
they should be merged.
> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> ---------------------------------------------------------------------
>
> Key: HDFS-14038
> URL: https://issues.apache.org/jira/browse/HDFS-14038
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Xiao Chen
> Priority: Major
>
> In SPARK-25855 /
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark
> prefer to create Spark event log files with replication (instead of EC). To
> do this currently, it has to be done by some casting / reflection, to get a
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}}
> subclass of it).
> We should officially expose this for Spark's usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]