[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106200#comment-14106200 ]
Evan Chan commented on SPARK-2360: ---------------------------------- +1 for this feature. I just had to write something for importing tab-delimited CSVs and converting the types of each column. As for API, it really needs to do type conversion into the built-in types; otherwise it really affects the caching compression efficiency and query speed, as well as what functions can be run on it. I think this is crucial. Maybe one can pass in a Map[String, ColumnType] or something like that. If a type is not specified for a column, then it is assumed to be String. > CSV import to SchemaRDDs > ------------------------ > > Key: SPARK-2360 > URL: https://issues.apache.org/jira/browse/SPARK-2360 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Michael Armbrust > Assignee: Hossein Falaki > > I think the first step it to design the interface that we want to present to > users. Mostly this is defining options when importing. Off the top of my > head: > - What is the separator? > - Provide column names or infer them from the first row. > - how to handle multiple files with possibly different schemas > - do we have a method to let users specify the datatypes of the columns or > are they just strings? > - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org