Re: FileStreamSource source checks path eagerly?

Jacek Laskowski Thu, 08 Sep 2016 02:21:07 -0700

On Thu, Sep 8, 2016 at 9:03 AM, Fred Reiss <freiss....@gmail.com> wrote:


> I suppose the type-inference-time check for the presence of the input
> directory could be moved to the FileStreamSource's initialization. But if
> the directory isn't there when the source is being created, it probably
> won't be there when the source is instantiated.

Hi Fred,

Thanks for your prompt response, Fred.

Isn't it opposite to sc.textFile? The source might not be available
until load. There's no reason it should. Yet it is definitely not
against the "contract" of DataFrameReader.textFile and perhaps it's
implictly assumed in SQL.

scala> spark.read.textFile("whatever")
org.apache.spark.sql.AnalysisException: Path does not exist:
file:/Users/jacek/dev/oss/spark/whatever;
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:371)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:360)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:360)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:500)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:536)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:509)
  ... 48 elided

I thought it might've been due to schema inference but...

scala> spark.read.schema(StructType(Seq())).textFile("whatever")
org.apache.spark.sql.AnalysisException: User specified schema not
supported with `textFile`;
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:534)
  at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:509)
  ... 50 elided

(which also confuses me, but don't wanna drag this thread in multiple
directions) Definitely need some help to understand the rationale
behing this eager behaviour.

Thanks!

Pozdrawiam,
Jacek

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: FileStreamSource source checks path eagerly?

Reply via email to