[jira] [Commented] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

Marcelo Vanzin (JIRA) Tue, 30 Oct 2018 10:29:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669087#comment-16669087
 ]


Marcelo Vanzin commented on HDFS-14038:
---------------------------------------

No, the problem here is not the use of reflection. That is needed because Spark 
still has to build against Hadoop 2, which doesn't have that API.

The issue raised in that comment is that the method Spark uses is in a 
LimitedPrivate / Unstable API. Which means it can break at any time.

For example, a better approach would be to have a method in 
{{FSDataOutputStreamBuilder}}, which is marked as public. In fact, there's 
already {{replication()}}, to set the replication factor, but it doesn't seem 
related to the {{replicate()}} method in HdfsDataOutputStreamBuilder. Maybe 
they should be merged.

> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> ---------------------------------------------------------------------
>
>                 Key: HDFS-14038
>                 URL: https://issues.apache.org/jira/browse/HDFS-14038
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Xiao Chen
>            Priority: Major
>
> In SPARK-25855 / 
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark 
> prefer to create Spark event log files with replication (instead of EC). To 
> do this currently, it has to be done by some casting / reflection, to get a 
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
> subclass of it).
> We should officially expose this for Spark's usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

Reply via email to