[jira] [Created] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream

Tathagata Das (JIRA) Thu, 21 Apr 2016 17:48:07 -0700

Tathagata Das created SPARK-14832:
-------------------------------------

             Summary: Refactor DataSource to ensure schema is inferred only 
once when creating a file stream
                 Key: SPARK-14832
                 URL: https://issues.apache.org/jira/browse/SPARK-14832
             Project: Spark
          Issue Type: Sub-task
          Components: SQL, Streaming
            Reporter: Tathagata Das
            Assignee: Tathagata Das



When creating a file stream using sqlContext.write.stream(), existing files are 
scanned twice for finding the schema 
- Once, when creating a DataSource + StreamingRelation in the 
DataFrameReader.stream()
- Again, when creating streaming Source from the DataSource, in 
DataSource.createSource()

Instead, the schema should be generated only once, at the time of creating the 
dataframe, and when the streaming source is created, it should just reuse that 
schame



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream

Reply via email to