[ 
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669112#comment-16669112
 ] 

Xiao Chen commented on HDFS-14038:
----------------------------------

Thanks for the comments and sorry if I wasn't clear.

Yes, the goal of this jira is to investigate a reasonable way for downstream to 
construct the stream. We could add 'Spark' to DFS' LimitedPrivate. Or better 
yet figure out a way to expose this in a reasonable way on 
FSDataOutputStreamBuilder. Currently {{replicate}} is purely an HDFS concept, 
that is separate from {{replication}} which sets the replication factor but 
only if the file is using replication rather than EC. That said, replication 
factor is also only an hdfs concept, so I don't see why we can't move that up. 
Need great clarity not to confuse users of course.

> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> ---------------------------------------------------------------------
>
>                 Key: HDFS-14038
>                 URL: https://issues.apache.org/jira/browse/HDFS-14038
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Xiao Chen
>            Priority: Major
>
> In SPARK-25855 / 
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark 
> prefer to create Spark event log files with replication (instead of EC). To 
> do this currently, it has to be done by some casting / reflection, to get a 
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
> subclass of it).
> We should officially expose this for Spark's usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to