[jira] [Created] (SPARK-43343) Spark Streaming is not able to read a .txt file whose name has [] special character

Siying Dong (Jira) Tue, 02 May 2023 11:29:07 -0700

Siying Dong created SPARK-43343:
-----------------------------------

             Summary: Spark Streaming is not able to read a .txt file whose 
name has [] special character
                 Key: SPARK-43343
                 URL: https://issues.apache.org/jira/browse/SPARK-43343
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.4.0
            Reporter: Siying Dong



* For example, If a directory contains a following file:
/path/abc[123]
and users would load spark.readStream.format("text").load("/path") as stream 
input. It throws an exception, saying no matching path /path/abc[123]. Spark 
thinks abc[123] is a regex that only matches file named abc1, abc2 and abc3.

 * Upon investigation this is due to how we 
[getBatch|https://github.com/databricks/runtime/blob/3af402d23620a0952e151d96c3184d2233217c87/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L269]
 in the FileStreamSource. In `FileStreamSource` we already check file pattern 
matching and find all match file names. However, in DataSource we check for 
glob characters again and try to expend it 
[here|https://github.com/databricks/runtime/blob/3af402d23620a0952e151d96c3184d2233217c87/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L274].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-43343) Spark Streaming is not able to read a .txt file whose name has [] special character

Reply via email to