RE: JSON Arrays and Spark

Kappaganthu, Sivaram (ES) Tue, 11 Oct 2016 23:05:11 -0700

Hi,

Does this mean that handling any Json with kind of below schema  with spark is 
not a good fit?? I have requirement to parse the below Json that spans across 
multiple lines. Whats the best way to parse the jsns of this kind?? Please 
suggest.


root
|-- maindate: struct (nullable = true)
|    |-- mainidnId: string (nullable = true)
|-- Entity: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- Profile: struct (nullable = true)
|    |    |    |-- Kind: string (nullable = true)
|    |    |-- Identifier: string (nullable = true)
|    |    |-- Group: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- Period: struct (nullable = true)
|    |    |    |    |    |-- pid: string (nullable = true)
|    |    |    |    |    |-- pDate: string (nullable = true)
|    |    |    |    |    |-- quarter: long (nullable = true)
|    |    |    |    |    |-- labour: array (nullable = true)
|    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |-- category: string (nullable = true)
|    |    |    |    |    |    |    |-- id: string (nullable = true)
|    |    |    |    |    |    |    |-- person: struct (nullable = true)
|    |    |    |    |    |    |    |    |-- address: array (nullable = true)
|    |    |    |    |    |    |    |    |    |-- element: struct (containsNull 
= true)
|    |    |    |    |    |    |    |    |    |    |-- city: string (nullable = 
true)
|    |    |    |    |    |    |    |    |    |    |-- line1: string (nullable = 
true)
|    |    |    |    |    |    |    |    |    |    |-- line2: string (nullable = 
true)
|    |    |    |    |    |    |    |    |    |    |-- postalCode: string 
(nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- state: string (nullable = 
true)
|    |    |    |    |    |    |    |    |    |    |-- type: string (nullable = 
true)
|    |    |    |    |    |    |    |    |-- familyName: string (nullable = true)
|    |    |    |    |    |    |    |-- tax: array (nullable = true)
|    |    |    |    |    |    |    |    |-- element: struct (containsNull = 
true)
|    |    |    |    |    |    |    |    |    |-- code: string (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qwage: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qvalue: double (nullable = 
true)
|    |    |    |    |    |    |    |    |    |-- qSubjectvalue: double 
(nullable = true)
|    |    |    |    |    |    |    |    |    |-- qfinalvalue: double (nullable 
= true)
|    |    |    |    |    |    |    |    |    |-- ywage: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- yalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- ySubjectvalue: double 
(nullable = true)
|    |    |    |    |    |    |    |    |    |-- yfinalvalue: double (nullable 
= true)
|    |    |    |    |    |    |    |-- tProfile: array (nullable = true)
|    |    |    |    |    |    |    |    |-- element: struct (containsNull = 
true)
|    |    |    |    |    |    |    |    |    |-- isExempt: boolean (nullable = 
true)
|    |    |    |    |    |    |    |    |    |-- jurisdiction: struct (nullable 
= true)
|    |    |    |    |    |    |    |    |    |    |-- code: string (nullable = 
true)
|    |    |    |    |    |    |    |    |    |-- maritalStatus: string 
(nullable = true)
|    |    |    |    |    |    |    |    |    |-- numberOfDeductions: long 
(nullable = true)
|    |    |    |    |    |    |    |-- wDate: struct (nullable = true)
|    |    |    |    |    |    |    |    |-- originalHireDate: string (nullable 
= true)
|    |    |    |    |    |-- year: long (nullable = true)


From: Luciano Resende [mailto:luckbr1...@gmail.com]
Sent: Monday, October 10, 2016 11:39 PM
To: Jean Georges Perrin
Cc: user @spark
Subject: Re: JSON Arrays and Spark

Please take a look at
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Particularly the note at the required format :

Note that the file that is offered as a json file is not a typical JSON file. 
Each line must contain a separate, self-contained valid JSON object. As a 
consequence, a regular multi-line JSON file will most often fail.


On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin 
<j...@jgp.net<mailto:j...@jgp.net>> wrote:
Hi folks,

I am trying to parse JSON arrays and it’s getting a little crazy (for me at 
least)…

1)
If my JSON is:
{"vals":[100,500,600,700,800,200,900,300]}

I get:
+--------------------+
|                vals|
+--------------------+
|[100, 500, 600, 7...|
+--------------------+

root
 |-- vals: array (nullable = true)
 |    |-- element: long (containsNull = true)

and I am :)

2)
If my JSON is:
[100,500,600,700,800,200,900,300]

I get:
+--------------------+
|     _corrupt_record|
+--------------------+
|[100,500,600,700,...|
+--------------------+

root
 |-- _corrupt_record: string (nullable = true)

Both are legit JSON structures… Do you think that #2 is a bug?

jg








--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

----------------------------------------------------------------------
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.

RE: JSON Arrays and Spark

Reply via email to