[
https://issues.apache.org/jira/browse/SPARK-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
pin_zhang updated SPARK-25169:
------------------------------
Component/s: (was: Spark Core)
SQL
> Multiple DataFrames cannot write to the same folder concurrently
> ----------------------------------------------------------------
>
> Key: SPARK-25169
> URL: https://issues.apache.org/jira/browse/SPARK-25169
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.1
> Reporter: pin_zhang
> Priority: Major
>
>
> Seems DataFrame writer cannot support write to the same folder concurrently.
> Steps to reproduce
> val sc = new SparkContext(conf)
> val hiveContext = new HiveContext(sc)
> val source="file:///G:/home/json"
> val target ="file:///G:/home/oad"
> new Thread(new Runnable {
> override def run(): Unit = {
> hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
> Thread.sleep(1000L)
> }
> }).start()
> new Thread(new Runnable {
> override def run(): Unit = {
> hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
> Thread.sleep(1000L)
> }
> }).start()
> new Thread(new Runnable {
> override def run(): Unit = {
> hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
> Thread.sleep(1000L)
> }
> }).start()
>
> Meet exceptions
> java.io.FileNotFoundException: File
> file:/G:/home/oad/_temporary/0/task_20180821151921_0004_m_000001/.part-00001-463ee671-0ef0-42ff-8968-1d960bc87996-c000.json.crc
> does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]