Re: How to convert a Dataset to a Dataset?

marc nicole Sat, 04 Jun 2022 09:32:18 -0700

How to do just that? i thought we only can inferSchema when we first read
the dataset, or am i wrong?


Le sam. 4 juin 2022 à 18:10, Sean Owen <sro...@gmail.com> a écrit :

> It sounds like you want to interpret the input as strings, do some
> processing, then infer the schema. That has nothing to do with construing
> the entire row as a string like "Row[foo=bar, baz=1]"
>
> On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com> wrote:
>
>> Hi Sean,
>>
>> Thanks, actually I have a dataset where I want to inferSchema after
>> discarding the specific String value of "+". I do this because the column
>> would be considered StringType while if i remove that "+" value it will be
>> considered DoubleType for example or something else. Basically I want to
>> remove "+" from all dataset rows and then inferschema.
>> Here my idea is to filter the rows not equal to "+" for the target
>> columns (potentially all of them) and then use spark.read().csv() to read
>> the new filtered dataset with the option inferSchema which would then yield
>> correct column types.
>> What do you think?
>>
>> Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> I don't think you want to do that. You get a string representation of
>>> structured data without the structure, at best. This is part of the reason
>>> it doesn't work directly this way.
>>> You can use a UDF to call .toString on the Row of course, but, again
>>> what are you really trying to do?
>>>
>>> On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> How to convert a Dataset<Row> to a Dataset<String>?
>>>> What i have tried is:
>>>>
>>>> List<String> list = dataset.as(Encoders.STRING()).collectAsList();
>>>> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING());
>>>> // But this line raises a org.apache.spark.sql.AnalysisException: Try to
>>>> map struct... to Tuple1, but failed as the number of fields does not line
>>>> up
>>>>
>>>> Type of columns being String
>>>> How to solve this?
>>>>
>>>

Re: How to convert a Dataset to a Dataset?

Reply via email to