Re: from_csv

Maxim Gekk Sun, 16 Sep 2018 01:41:35 -0700

Hi Reynold,

> i'd make this as consistent as to_json / from_json as possible


Sure, new function from_csv() has the same signature as from_json().

> how would this work in sql? i.e. how would passing options in work?

The options are passed to the function via map, for example:
select from_csv('26/08/2015', 'time Timestamp', map('timestampFormat',
'dd/MM/yyyy'))

On Sun, Sep 16, 2018 at 7:01 AM Reynold Xin <[email protected]> wrote:

> makes sense - i'd make this as consistent as to_json / from_json as
> possible.
>
> how would this work in sql? i.e. how would passing options in work?
>
> --
> excuse the brevity and lower case due to wrist injury
>
>
> On Sat, Sep 15, 2018 at 2:58 AM Maxim Gekk <[email protected]>
> wrote:
>
>> Hi All,
>>
>> I would like to propose new function from_csv() for parsing columns
>> containing strings in CSV format. Here is my PR:
>> https://github.com/apache/spark/pull/22379
>>
>> An use case is loading a dataset from an external storage, dbms or
>> systems like Kafka to where CSV content was dumped as one of
>> columns/fields. Other columns could contain related information like
>> timestamps, ids, sources of data and etc. The column with CSV strings can
>> be parsed by existing method csv() of DataFrameReader but in that case
>> we have to "clean up" dataset and remove other columns since the csv()
>> method requires Dataset[String]. Joining back result of parsing and
>> original dataset by positions is expensive and not convenient. Instead
>> users parse CSV columns by string functions. The approach is usually error
>> prone especially for quoted values and other special cases.
>>
>> The proposed in the PR methods should make a better user experience in
>> parsing CSV-like columns. Please, share your thoughts.
>>
>> --
>>
>> Maxim Gekk
>>
>> Technical Solutions Lead
>>
>> Databricks Inc.
>>
>> [email protected]
>>
>> databricks.com
>>
>>   <http://databricks.com/>
>>
>

Re: from_csv

Reply via email to