Re: Nullpointerexception error when in repartition

2018-04-12 Thread Junfeng Chen
Hi, I know it, but my purpose it to transforming json string in DataSet to Dataset, while spark.readStream can only support read json file in specified path. https://stackoverflow.com/questions/48617474/how-to-convert-json-dataset-to-dataframe-in-spark-structured-streaming gives an essential

Re: Nullpointerexception error when in repartition

2018-04-12 Thread Tathagata Das
Have you read through the documentation of Structured Streaming? https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html One of the basic mistakes you are making is defining the dataset as with `spark.read()`. You define a streaming Dataset as `spark.readStream()` On

Re: Nullpointerexception error when in repartition

2018-04-12 Thread Junfeng Chen
Hi, Tathagata I have tried structured streaming, but in line > Dataset rowDataset = spark.read().json(jsondataset); Always throw > Queries with streaming sources must be executed with writeStream.start() But what i need to do in this step is only transforming json string data to Dataset .

Re: Nullpointerexception error when in repartition

2018-04-12 Thread Tathagata Das
It's not very surprising that doing this sort of RDD to DF conversion inside DStream.foreachRDD has weird corner cases like this. In fact, you are going to have additional problems with partial parquet files (when there are failures) in this approach. I strongly suggest that you use Structured

Nullpointerexception error when in repartition

2018-04-11 Thread Junfeng Chen
I write a program to read some json data from kafka and purpose to save them to parquet file on hdfs. Here is my code: > JavaInputDstream stream = ... > JavaDstream rdd = stream.map... > rdd.repartition(taksNum).foreachRDD(VoldFunction stringjavardd->{ > Dataset df =