Hi all,

I figured it out! The DataFrames and SQL example in Spark Streaming docs
were useful.

Best,
Vadim
ᐧ

On Wed, Apr 8, 2015 at 2:38 PM, Vadim Bichutskiy <vadim.bichuts...@gmail.com
> wrote:

> Hi all,
>
> I am using Spark Streaming to monitor an S3 bucket for objects that
> contain JSON. I want
> to import that JSON into Spark SQL DataFrame.
>
> Here's my current code:
>
> *from pyspark import SparkContext, SparkConf*
> *from pyspark.streaming import StreamingContext*
> *import json*
> *from pyspark.sql import SQLContext*
>
> *conf = SparkConf().setAppName('MyApp').setMaster('local[4]')*
> *sc = SparkContext(conf=conf)*
> *ssc = StreamingContext(sc, 30)*
> *sqlContext = SQLContext(sc)*
>
> *distFile = ssc.textFileStream("s3n://mybucket/")*
> *json_data = sqlContext.jsonRDD(distFile)*
> *json_data.printSchema()*
>
> *ssc.start()*
> *ssc.awaitTermination()*
>
> I am not creating DataFrame correctly as I get an error:
>
> *'TransformedDStream' object has no attribute '_jrdd'*
> Can someone help me out?
>
> Thanks,
> Vadim
>
> ᐧ
>

Reply via email to