Re: Dataframe nested schema inference from Json without type conflicts

Reynold Xin Thu, 01 Oct 2015 14:12:49 -0700

You can pass the schema into json directly, can't you?

On Thu, Oct 1, 2015 at 10:33 AM, Ewan Leith <ewan.le...@realitymine.com>
wrote:


> Hi all,
>
>
>
> We really like the ability to infer a schema from JSON contained in an
> RDD, but when we’re using Spark Streaming on small batches of data, we
> sometimes find that Spark infers a more specific type than it should use,
> for example if the json in that small batch only contains integer values
> for a String field, it’ll class the field as an Integer type on one
> Streaming batch, then a String on the next one.
>
>
>
> Instead, we’d rather match every value as a String type, then handle any
> casting to a desired type later in the process.
>
>
>
> I don’t think there’s currently any simple way to avoid this that I can
> see, but we could add the functionality in the JacksonParser.scala file,
> probably in convertField.
>
>
>
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala
>
>
>
> Does anyone know an easier and cleaner way to do this?
>
>
>
> Thanks,
>
> Ewan
>

Re: Dataframe nested schema inference from Json without type conflicts

Reply via email to