Yunus Emre Gürses created SPARK-45519:
-----------------------------------------
Summary: cleanSource problem on FileStreamSource for Windows env
Key: SPARK-45519
URL: https://issues.apache.org/jira/browse/SPARK-45519
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.4.1
Reporter: Yunus Emre Gürses
We are using Spark with Scala in Windows environment. While streaming using
Spark, I give the *{{cleanSource}}* option as "archive" and the
*{{sourceArchiveDir}}* option as "archived" as in the code below.
{code:java}
spark.readStream
.option("cleanSource", "archive")
.option("sourceArchiveDir", "archived"){code}
When I tried this in a Linux environment, I realized that the problem was with
the paths. Because when I set archive mode to "delete", it works on both Linux
and Windows. But for the archive mode, it does not work on Windows.
The problem is related to appending paths in Windows. There is a method
{code:java}
override protected def cleanTask(entry: FileEntry): Unit{code}
in the FileStreamSource.scala file in the
org.apache.spark.sql.execution.streaming package. On line 569, the
!fileSystem.rename(curPath, newPath) code supposed to move source file to
archive folder. However, when I debugged, I noticed that the curPath and
newPath values were as follows in windows:
{code:java}
curPath:
file:/C:/dev/be/data-integration-suite/test-data/streaming-folder/patients/patients-success.csv{code}
{code:java}
newPath:
file:/C:/dev/be/data-integration-suite/archived/C:/dev/be/data-integration-suite/test-data/streaming-folder/patients/patients-success.csv{code}
It seems that absolute path of csv file were appended when creating newPath
because there are two *C:/dev/be/data-integration-suite* in the newPath. This
is the reason probably spark archiving does not work. Instead, newPath should
be:
file:/C:/dev/be/data-integration-suite/archived/test-data/streaming-folder/patients/patients-success.csv
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]