Hi all,

I am using Spark Streaming to monitor an S3 bucket for objects that contain
JSON. I want
to import that JSON into Spark SQL DataFrame.

Here's my current code:

*from pyspark import SparkContext, SparkConf*
*from pyspark.streaming import StreamingContext*
*import json*
*from pyspark.sql import SQLContext*

*conf = SparkConf().setAppName('MyApp').setMaster('local[4]')*
*sc = SparkContext(conf=conf)*
*ssc = StreamingContext(sc, 30)*
*sqlContext = SQLContext(sc)*

*distFile = ssc.textFileStream("s3n://mybucket/")*
*json_data = sqlContext.jsonRDD(distFile)*
*json_data.printSchema()*

*ssc.start()*
*ssc.awaitTermination()*

I am not creating DataFrame correctly as I get an error:

*'TransformedDStream' object has no attribute '_jrdd'*
Can someone help me out?

Thanks,
Vadim

ᐧ

Reply via email to