You can pass the schema into json directly, can't you? On Thu, Oct 1, 2015 at 10:33 AM, Ewan Leith <ewan.le...@realitymine.com> wrote:
> Hi all, > > > > We really like the ability to infer a schema from JSON contained in an > RDD, but when we’re using Spark Streaming on small batches of data, we > sometimes find that Spark infers a more specific type than it should use, > for example if the json in that small batch only contains integer values > for a String field, it’ll class the field as an Integer type on one > Streaming batch, then a String on the next one. > > > > Instead, we’d rather match every value as a String type, then handle any > casting to a desired type later in the process. > > > > I don’t think there’s currently any simple way to avoid this that I can > see, but we could add the functionality in the JacksonParser.scala file, > probably in convertField. > > > > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala > > > > Does anyone know an easier and cleaner way to do this? > > > > Thanks, > > Ewan >