Siying Dong created SPARK-43343:
-----------------------------------
Summary: Spark Streaming is not able to read a .txt file whose
name has [] special character
Key: SPARK-43343
URL: https://issues.apache.org/jira/browse/SPARK-43343
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Siying Dong
* For example, If a directory contains a following file:
/path/abc[123]
and users would load spark.readStream.format("text").load("/path") as stream
input. It throws an exception, saying no matching path /path/abc[123]. Spark
thinks abc[123] is a regex that only matches file named abc1, abc2 and abc3.
* Upon investigation this is due to how we
[getBatch|https://github.com/databricks/runtime/blob/3af402d23620a0952e151d96c3184d2233217c87/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L269]
in the FileStreamSource. In `FileStreamSource` we already check file pattern
matching and find all match file names. However, in DataSource we check for
glob characters again and try to expend it
[here|https://github.com/databricks/runtime/blob/3af402d23620a0952e151d96c3184d2233217c87/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L274].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]