Hi Reynold, > i'd make this as consistent as to_json / from_json as possible
Sure, new function from_csv() has the same signature as from_json(). > how would this work in sql? i.e. how would passing options in work? The options are passed to the function via map, for example: select from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')) On Sun, Sep 16, 2018 at 7:01 AM Reynold Xin <r...@databricks.com> wrote: > makes sense - i'd make this as consistent as to_json / from_json as > possible. > > how would this work in sql? i.e. how would passing options in work? > > -- > excuse the brevity and lower case due to wrist injury > > > On Sat, Sep 15, 2018 at 2:58 AM Maxim Gekk <maxim.g...@databricks.com> > wrote: > >> Hi All, >> >> I would like to propose new function from_csv() for parsing columns >> containing strings in CSV format. Here is my PR: >> https://github.com/apache/spark/pull/22379 >> >> An use case is loading a dataset from an external storage, dbms or >> systems like Kafka to where CSV content was dumped as one of >> columns/fields. Other columns could contain related information like >> timestamps, ids, sources of data and etc. The column with CSV strings can >> be parsed by existing method csv() of DataFrameReader but in that case >> we have to "clean up" dataset and remove other columns since the csv() >> method requires Dataset[String]. Joining back result of parsing and >> original dataset by positions is expensive and not convenient. Instead >> users parse CSV columns by string functions. The approach is usually error >> prone especially for quoted values and other special cases. >> >> The proposed in the PR methods should make a better user experience in >> parsing CSV-like columns. Please, share your thoughts. >> >> -- >> >> Maxim Gekk >> >> Technical Solutions Lead >> >> Databricks Inc. >> >> maxim.g...@databricks.com >> >> databricks.com >> >> <http://databricks.com/> >> >