Reynold Xin created SPARK-18021:
-----------------------------------
Summary: Refactor file name specification for data sources
Key: SPARK-18021
URL: https://issues.apache.org/jira/browse/SPARK-18021
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Currently each data source OutputWriter is responsible for specifying the
entire file name for each file output. This, however, does not make any sense
because we rely on file name for certain behaviors in Spark SQL, e.g. bucket
id. The current approach allows individual data sources to break the
implementation of bucketing.
We don't want to move file name entirely also out of the data sources, because
different data sources do want to specify different extensions.
A good compromise is for the OutputWriter to take in the prefix for a file, and
it can add its own suffix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]