.where("xxx IS NOT NULL") will give you the rows that couldn't be parsed.
On Tue, Dec 6, 2016 at 6:31 AM, Yehuda Finkelstein < yeh...@veracity-group.com> wrote: > Hi all > > > > I’m trying to parse json using existing schema and got rows with NULL’s > > //get schema > > val df_schema = spark.sqlContext.sql("select c1,c2,…cn t1 limit 1") > > //read json file > > val f = sc.textFile("/tmp/x") > > //load json into data frame using schema > > var df = spark.sqlContext.read.option("columnNameOfCorruptRecord"," > xxx").option("mode","PERMISSIVE").schema(df_schema.schema).json(f) > > > > in documentation it say that you can query the corrupted rows by this > columns à columnNameOfCorruptRecord > > o “columnNameOfCorruptRecord (default is the value specified in > spark.sql.columnNameOfCorruptRecord): allows renaming the new field > having malformed string created by PERMISSIVE mode. This overrides > spark.sql.columnNameOfCorruptRecord.” > > > > The question is how to fetch those corrupted rows ? > > > > > > Thanks > > Yehuda > > > > >