Siegmann
Cc: user @spark
Subject: Re: Quirk in how Spark DF handles JSON input records?
On Nov 2, 2016, at 2:22 PM, Daniel Siegmann
<dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:
Yes, it needs to be on a single line. Spark (or Hadoop really) treat
k
Subject: Re: Quirk in how Spark DF handles JSON input records?
On Nov 2, 2016, at 2:22 PM, Daniel Siegmann
<dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:
Yes, it needs to be on a single line. Spark (or Hadoop really) treats newlines
as a re
On Nov 2, 2016, at 2:22 PM, Daniel Siegmann
> wrote:
Yes, it needs to be on a single line. Spark (or Hadoop really) treats newlines
as a record separator by default. While it is possible to use a different
string as a
Yes, it needs to be on a single line. Spark (or Hadoop really) treats
newlines as a record separator by default. While it is possible to use a
different string as a record separator, what would you use in the case of
JSON?
If you do some Googling I suspect you'll find some possible solutions.
ARGH!!
Looks like a formatting issue. Spark doesn’t like ‘pretty’ output.
So then the entire record which defines the schema has to be a single line?
Really?
On Nov 2, 2016, at 1:50 PM, Michael Segel
> wrote:
This may be a silly
This may be a silly mistake on my part…
Doing an example using Chicago’s Crime data.. (There’s a lot of it going
around. ;-)
The goal is to read a file containing a JSON record that describes the crime
data.csv for ingestion into a data frame, then I want to output to a Parquet
file.
(Pretty