Tathagata Das created SPARK-14832:
-------------------------------------
Summary: Refactor DataSource to ensure schema is inferred only
once when creating a file stream
Key: SPARK-14832
URL: https://issues.apache.org/jira/browse/SPARK-14832
Project: Spark
Issue Type: Sub-task
Components: SQL, Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
When creating a file stream using sqlContext.write.stream(), existing files are
scanned twice for finding the schema
- Once, when creating a DataSource + StreamingRelation in the
DataFrameReader.stream()
- Again, when creating streaming Source from the DataSource, in
DataSource.createSource()
Instead, the schema should be generated only once, at the time of creating the
dataframe, and when the streaming source is created, it should just reuse that
schame
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]