Jorge Machado created SPARK-30647:
-------------------------------------
Summary: When creating a custom datasource File NotFoundExpection
happens
Key: SPARK-30647
URL: https://issues.apache.org/jira/browse/SPARK-30647
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.3.2
Reporter: Jorge Machado
Hello, I'm creating a datasource based on FileFormat and DataSourceRegister.
when I pass a path or a file that has a white space it seems to fail wit the
error:
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in
stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID
213, localhost, executor driver): java.io.FileNotFoundException: File
file:somePath/0019_leftImg8%20bit.png does not exist It is possible the
underlying files have been updated. You can explicitly invalidate the cache in
Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the
Dataset/DataFrame involved. at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source) at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at
org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
{code}
I'm happy to fix this if someone tells me where I need to look.
I think it is on org.apache.spark.rdd.InputFileBlockHolder :
{code:java}
inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset,
length))
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]