Yeah, the right solution is to have something like SchemaDStream, where the schema of all the schemaRDD generated by it can be stored. Something I really would like to see happen in the future :)
TD On Thu, Jul 10, 2014 at 6:37 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > I think it would be great if we could do the string parsing only once and > then just apply the transformation for each interval (reducing the > processing overhead for short intervals). > > Also, one issue with the approach above is that transform() has the > following signature: > > def transform(transformFunc: RDD[T] => RDD[U]): DStream[U] > > and therefore, in my example > > val result = lines.transform((rdd, time) => { >>> // execute statement >>> rdd.registerAsTable("data") >>> sqlc.sql(query) >>> }) >>> >> > the variable `result ` is of type DStream[Row]. That is, the > meta-information from the SchemaRDD is lost and, from what I understand, > there is then no way to learn about the column names of the returned data, > as this information is only encoded in the SchemaRDD. I would love to see a > fix for this. > > Thanks > Tobias > > >