Your best bet might be to use a map<string,string> in SQL and make the keys be longer paths (e.g. params_param1 and params_param2). I don't think you can have a map in some of them but not in others.
Matei > On May 28, 2015, at 3:48 PM, Jeremy Lucas <jeremyalu...@gmail.com> wrote: > > Hey Reynold, > > Thanks for the suggestion. Maybe a better definition of what I mean by a > "recursive" data structure is rather what might resemble (in Scala) the type > Map[String, Any]. With a type like this, the keys are well-defined as strings > (as this is JSON) but the values can be basically any arbitrary value, > including another Map[String, Any]. > > For example, in the below "stream" of JSON records: > > { > "timestamp": "2015-01-01T00:00:00Z", > "data": { > "event": "click", > "url": "http://mywebsite.com <http://mywebsite.com/>" > } > } > ... > { > "timestamp": "2015-01-01T08:00:00Z", > "data": { > "event": "purchase", > "sku": "123456789", > "quantity": 1, > "params": { > "arbitrary-param-1": "blah", > "arbitrary-param-2": 123456 > } > } > > I am trying to figure out a way to run SparkSQL over the above JSON records. > My inclination would be to define the "timestamp" field as a well-defined > DateType, but the "data" field is way more free-form. > > Also, any pointers on where to look for how data types are evaluated and > serialized/deserialized would be super helpful as well. > > Thanks > > > > On Thu, May 28, 2015 at 12:30 AM Reynold Xin <r...@databricks.com > <mailto:r...@databricks.com>> wrote: > I think it is fairly hard to support recursive data types. What I've seen in > one other proprietary system in the past is to let the user define the depth > of the nested data types, and then just expand the struct/map/list definition > to the maximum level of depth. > > Would this solve your problem? > > > > > On Wed, May 20, 2015 at 6:07 PM, Jeremy Lucas <jeremyalu...@gmail.com > <mailto:jeremyalu...@gmail.com>> wrote: > Hey Rakesh, > > To clarify, what I was referring to is when doing something like this: > > sqlContext.applySchema(rdd, mySchema) > > mySchema must be a well-defined StructType, which presently does not allow > for a recursive type. > > > On Wed, May 20, 2015 at 5:39 PM Rakesh Chalasani <vnit.rak...@gmail.com > <mailto:vnit.rak...@gmail.com>> wrote: > Hi Jeremy: > > Row is a collect of 'Any'. So, you can be used as a recursive data type. Is > this what you were looking for? > > Example: > val x = sc.parallelize(Array.range(0,10)).map(x => Row(Row(x), > Row(x.toString))) > > Rakesh > > > > On Wed, May 20, 2015 at 7:23 PM Jeremy Lucas <jeremyalu...@gmail.com > <mailto:jeremyalu...@gmail.com>> wrote: > Spark SQL has proven to be quite useful in applying a partial schema to large > JSON logs and being able to write plain SQL to perform a wide variety of > operations over this data. However, one small thing that keeps coming back to > haunt me is the lack of support for recursive data types, whereby a member of > a complex/struct value can be of the same type as the complex/struct value > itself. > I am hoping someone may be able to point me in the right direction of where > to start to build out such capabilities, as I'd be happy to contribute, but > am very new to this particular component of the Spark project. >