How to do just that? i thought we only can inferSchema when we first read the dataset, or am i wrong?
Le sam. 4 juin 2022 à 18:10, Sean Owen <sro...@gmail.com> a écrit : > It sounds like you want to interpret the input as strings, do some > processing, then infer the schema. That has nothing to do with construing > the entire row as a string like "Row[foo=bar, baz=1]" > > On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com> wrote: > >> Hi Sean, >> >> Thanks, actually I have a dataset where I want to inferSchema after >> discarding the specific String value of "+". I do this because the column >> would be considered StringType while if i remove that "+" value it will be >> considered DoubleType for example or something else. Basically I want to >> remove "+" from all dataset rows and then inferschema. >> Here my idea is to filter the rows not equal to "+" for the target >> columns (potentially all of them) and then use spark.read().csv() to read >> the new filtered dataset with the option inferSchema which would then yield >> correct column types. >> What do you think? >> >> Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit : >> >>> I don't think you want to do that. You get a string representation of >>> structured data without the structure, at best. This is part of the reason >>> it doesn't work directly this way. >>> You can use a UDF to call .toString on the Row of course, but, again >>> what are you really trying to do? >>> >>> On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote: >>> >>>> Hi, >>>> How to convert a Dataset<Row> to a Dataset<String>? >>>> What i have tried is: >>>> >>>> List<String> list = dataset.as(Encoders.STRING()).collectAsList(); >>>> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING()); >>>> // But this line raises a org.apache.spark.sql.AnalysisException: Try to >>>> map struct... to Tuple1, but failed as the number of fields does not line >>>> up >>>> >>>> Type of columns being String >>>> How to solve this? >>>> >>>