Yeah, the right solution is to have something like SchemaDStream, where the
schema of all the schemaRDD generated by it can be stored. Something I
really would like to see happen in the future :)

TD


On Thu, Jul 10, 2014 at 6:37 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Hi,
>
> I think it would be great if we could do the string parsing only once and
> then just apply the transformation for each interval (reducing the
> processing overhead for short intervals).
>
> Also, one issue with the approach above is that transform() has the
> following signature:
>
>   def transform(transformFunc: RDD[T] => RDD[U]): DStream[U]
>
> and therefore, in my example
>
> val result = lines.transform((rdd, time) => {
>>>   // execute statement
>>>   rdd.registerAsTable("data")
>>>   sqlc.sql(query)
>>> })
>>>
>>
> the variable `result ` is of type DStream[Row]. That is, the
> meta-information from the SchemaRDD is lost and, from what I understand,
> there is then no way to learn about the column names of the returned data,
> as this information is only encoded in the SchemaRDD. I would love to see a
> fix for this.
>
> Thanks
> Tobias
>
>
>

Reply via email to