Re: Error when loading json to spark
Thank you very much Marco, is your code in Scala? do you have a python example? Can anyone give me a python example to handle json data on Spark? ** *Sincerely yours,* *Raymond* On Sun, Jan 1, 2017 at 12:29 PM, Marco Mistroniwrote: > Hi >you will need to pass the schema, like in the snippet below (even > though the code might have been superseeded in spark 2.0) > > import sqlContext.implicits._ > val jsonRdd = sc.textFile("file:///c:/tmp/1973-01-11.json") > val schema = (new StructType).add("hour", StringType).add("month", > StringType) > .add("second", StringType).add("year", StringType) > .add("timezone", StringType).add("day", StringType) > .add("minute", StringType) > val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema) > > But somehow i seem to remember that there was a way , in Spark 2.0, so > that Spark will infer the schema for you.. > > hth > marco > > > > > > On Sun, Jan 1, 2017 at 12:40 PM, Raymond Xie wrote: > >> I found the cause: >> >> I need to "put" the json file onto hdfs first before it can be used, here >> is what I did: >> >> hdfs dfs -put /root/Downloads/data/json/world_bank.json >> hdfs://localhost:9000/json >> df = sqlContext.read.json("/json/") >> df.show(10) >> >> . >> >> However, there is a new problem here, the json data needs to be sort of >> treaked before it can be really used, simply using df = >> sqlContext.read.json("/json/") just makes the df messy, I need the df know >> the fields in the json file. >> >> How? >> >> Thank you. >> >> >> >> >> ** >> *Sincerely yours,* >> >> >> *Raymond* >> >> On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales > > wrote: >> >>> Looks like it's trying to treat that path as a folder, try omitting >>> the file name and just use the folder path. >>> >>> On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie >>> wrote: >>> > Happy new year!!! >>> > >>> > I am trying to load a json file into spark, the json file is attached >>> here. >>> > >>> > I received the following error, can anyone help me to fix it? Thank >>> you very >>> > much. I am using Spark 1.6.2 and python 2.7.5 >>> > >>> from pyspark.sql import SQLContext >>> sqlContext = SQLContext(sc) >>> df = sqlContext.read.json("/root/Downloads/data/json/world_bank.j >>> son") >>> > 16/12/31 22:54:53 INFO json.JSONRelation: Listing >>> > hdfs://localhost:9000/root/Downloads/data/json/world_bank.json on >>> driver >>> > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0 stored as >>> > values in memory (estimated size 212.4 KB, free 212.4 KB) >>> > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0_piece0 >>> stored >>> > as bytes in memory (estimated size 19.6 KB, free 232.0 KB) >>> > 16/12/31 22:54:54 INFO storage.BlockManagerInfo: Added >>> broadcast_0_piece0 in >>> > memory on localhost:39844 (size: 19.6 KB, free: 511.1 MB) >>> > 16/12/31 22:54:54 INFO spark.SparkContext: Created broadcast 0 from >>> json at >>> > NativeMethodAccessorImpl.java:-2 >>> > Traceback (most recent call last): >>> > File "", line 1, in >>> > File "/opt/spark/python/pyspark/sql/readwriter.py", line 176, in >>> json >>> > return self._df(self._jreader.json(path)) >>> > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", >>> line >>> > 813, in __call__ >>> > File "/opt/spark/python/pyspark/sql/utils.py", line 45, in deco >>> > return f(*a, **kw) >>> > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", >>> line 308, >>> > in get_return_value >>> > py4j.protocol.Py4JJavaError: An error occurred while calling o19.json. >>> > : java.io.IOException: No input paths specified in job >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInpu >>> tFormat.java:201) >>> > at >>> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInput >>> Format.java:313) >>> > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) >>> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>> > at scala.Option.getOrElse(Option.scala:120) >>> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>> > at >>> > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti >>> tionsRDD.scala:35) >>> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >>> > at scala.Option.getOrElse(Option.scala:120) >>> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >>> > at >>> > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti >>> tionsRDD.scala:35) >>> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >>> > at
Re: Error when loading json to spark
Hi you will need to pass the schema, like in the snippet below (even though the code might have been superseeded in spark 2.0) import sqlContext.implicits._ val jsonRdd = sc.textFile("file:///c:/tmp/1973-01-11.json") val schema = (new StructType).add("hour", StringType).add("month", StringType) .add("second", StringType).add("year", StringType) .add("timezone", StringType).add("day", StringType) .add("minute", StringType) val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema) But somehow i seem to remember that there was a way , in Spark 2.0, so that Spark will infer the schema for you.. hth marco On Sun, Jan 1, 2017 at 12:40 PM, Raymond Xiewrote: > I found the cause: > > I need to "put" the json file onto hdfs first before it can be used, here > is what I did: > > hdfs dfs -put /root/Downloads/data/json/world_bank.json > hdfs://localhost:9000/json > df = sqlContext.read.json("/json/") > df.show(10) > > . > > However, there is a new problem here, the json data needs to be sort of > treaked before it can be really used, simply using df = > sqlContext.read.json("/json/") just makes the df messy, I need the df know > the fields in the json file. > > How? > > Thank you. > > > > > ** > *Sincerely yours,* > > > *Raymond* > > On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales > wrote: > >> Looks like it's trying to treat that path as a folder, try omitting >> the file name and just use the folder path. >> >> On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie >> wrote: >> > Happy new year!!! >> > >> > I am trying to load a json file into spark, the json file is attached >> here. >> > >> > I received the following error, can anyone help me to fix it? Thank you >> very >> > much. I am using Spark 1.6.2 and python 2.7.5 >> > >> from pyspark.sql import SQLContext >> sqlContext = SQLContext(sc) >> df = sqlContext.read.json("/root/Downloads/data/json/world_bank. >> json") >> > 16/12/31 22:54:53 INFO json.JSONRelation: Listing >> > hdfs://localhost:9000/root/Downloads/data/json/world_bank.json on >> driver >> > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0 stored as >> > values in memory (estimated size 212.4 KB, free 212.4 KB) >> > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0_piece0 >> stored >> > as bytes in memory (estimated size 19.6 KB, free 232.0 KB) >> > 16/12/31 22:54:54 INFO storage.BlockManagerInfo: Added >> broadcast_0_piece0 in >> > memory on localhost:39844 (size: 19.6 KB, free: 511.1 MB) >> > 16/12/31 22:54:54 INFO spark.SparkContext: Created broadcast 0 from >> json at >> > NativeMethodAccessorImpl.java:-2 >> > Traceback (most recent call last): >> > File "", line 1, in >> > File "/opt/spark/python/pyspark/sql/readwriter.py", line 176, in json >> > return self._df(self._jreader.json(path)) >> > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", >> line >> > 813, in __call__ >> > File "/opt/spark/python/pyspark/sql/utils.py", line 45, in deco >> > return f(*a, **kw) >> > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line >> 308, >> > in get_return_value >> > py4j.protocol.Py4JJavaError: An error occurred while calling o19.json. >> > : java.io.IOException: No input paths specified in job >> > at >> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInpu >> tFormat.java:201) >> > at >> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInput >> Format.java:313) >> > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> > at scala.Option.getOrElse(Option.scala:120) >> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> > at >> > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti >> tionsRDD.scala:35) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> > at scala.Option.getOrElse(Option.scala:120) >> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> > at >> > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti >> tionsRDD.scala:35) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) >> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) >> > at scala.Option.getOrElse(Option.scala:120) >> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) >> > at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD. >> scala:1129) >> > at >> > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:150) >> > at >> > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:111) >> > at
Re: Error when loading json to spark
I found the cause: I need to "put" the json file onto hdfs first before it can be used, here is what I did: hdfs dfs -put /root/Downloads/data/json/world_bank.json hdfs://localhost:9000/json df = sqlContext.read.json("/json/") df.show(10) . However, there is a new problem here, the json data needs to be sort of treaked before it can be really used, simply using df = sqlContext.read.json("/json/") just makes the df messy, I need the df know the fields in the json file. How? Thank you. ** *Sincerely yours,* *Raymond* On Sat, Dec 31, 2016 at 11:52 PM, Miguel Moraleswrote: > Looks like it's trying to treat that path as a folder, try omitting > the file name and just use the folder path. > > On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote: > > Happy new year!!! > > > > I am trying to load a json file into spark, the json file is attached > here. > > > > I received the following error, can anyone help me to fix it? Thank you > very > > much. I am using Spark 1.6.2 and python 2.7.5 > > > from pyspark.sql import SQLContext > sqlContext = SQLContext(sc) > df = sqlContext.read.json("/root/Downloads/data/json/world_ > bank.json") > > 16/12/31 22:54:53 INFO json.JSONRelation: Listing > > hdfs://localhost:9000/root/Downloads/data/json/world_bank.json on driver > > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0 stored as > > values in memory (estimated size 212.4 KB, free 212.4 KB) > > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0_piece0 > stored > > as bytes in memory (estimated size 19.6 KB, free 232.0 KB) > > 16/12/31 22:54:54 INFO storage.BlockManagerInfo: Added > broadcast_0_piece0 in > > memory on localhost:39844 (size: 19.6 KB, free: 511.1 MB) > > 16/12/31 22:54:54 INFO spark.SparkContext: Created broadcast 0 from json > at > > NativeMethodAccessorImpl.java:-2 > > Traceback (most recent call last): > > File "", line 1, in > > File "/opt/spark/python/pyspark/sql/readwriter.py", line 176, in json > > return self._df(self._jreader.json(path)) > > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line > > 813, in __call__ > > File "/opt/spark/python/pyspark/sql/utils.py", line 45, in deco > > return f(*a, **kw) > > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line > 308, > > in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling o19.json. > > : java.io.IOException: No input paths specified in job > > at > > org.apache.hadoop.mapred.FileInputFormat.listStatus( > FileInputFormat.java:201) > > at > > org.apache.hadoop.mapred.FileInputFormat.getSplits( > FileInputFormat.java:313) > > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions( > MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions( > MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply( > RDD.scala:1129) > > at > > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:150) > > at > > org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:111) > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > > at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1127) > > at > > org.apache.spark.sql.execution.datasources.json.InferSchema$.infer( > InferSchema.scala:65) > > at > > org.apache.spark.sql.execution.datasources.json. > JSONRelation$$anonfun$4.apply(JSONRelation.scala:114) > > at > > org.apache.spark.sql.execution.datasources.json. > JSONRelation$$anonfun$4.apply(JSONRelation.scala:109) > > at scala.Option.getOrElse(Option.scala:120) > > at > > org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$ > lzycompute(JSONRelation.scala:109) > > at > > org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema( > JSONRelation.scala:108) > > at > > org.apache.spark.sql.sources.HadoopFsRelation.schema$ > lzycompute(interfaces.scala:636) > > at > > org.apache.spark.sql.sources.HadoopFsRelation.schema( > interfaces.scala:635) > > at > >
Re: Error when loading json to spark
Thank you Miguel, here is the output: >>>df = sqlContext.read.json("/root/Downloads/data/json") 17/01/01 07:28:19 INFO json.JSONRelation: Listing hdfs://localhost:9000/root/Downloads/data/json on driver 17/01/01 07:28:19 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 212.4 KB, free 444.5 KB) 17/01/01 07:28:19 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.6 KB, free 464.1 KB) 17/01/01 07:28:19 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:39844 (size: 19.6 KB, free: 511.1 MB) 17/01/01 07:28:19 INFO spark.SparkContext: Created broadcast 2 from json at NativeMethodAccessorImpl.java:-2 Traceback (most recent call last): File "", line 1, in File "/opt/spark/python/pyspark/sql/readwriter.py", line 176, in json return self._df(self._jreader.json(path)) File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/opt/spark/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o115.json. : java.io.IOException: No input paths specified in job at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1129) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1127) at org.apache.spark.sql.execution.datasources.json.InferSchema$.infer(InferSchema.scala:65) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:114) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:109) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:109) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:108) at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636) at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635) at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) >>> ** *Sincerely yours,* *Raymond* On Sat, Dec 31, 2016 at 11:52 PM, Miguel Moraleswrote: > Looks like it's trying to treat that path as a folder, try omitting > the file name and just use the folder path. > > On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote: > > Happy new year!!! > > > > I am trying to load a json file into spark, the json file is attached > here. > > > > I received the following error, can
Re: Error when loading json to spark
Looks like it's trying to treat that path as a folder, try omitting the file name and just use the folder path. On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xiewrote: > Happy new year!!! > > I am trying to load a json file into spark, the json file is attached here. > > I received the following error, can anyone help me to fix it? Thank you very > much. I am using Spark 1.6.2 and python 2.7.5 > from pyspark.sql import SQLContext sqlContext = SQLContext(sc) df = sqlContext.read.json("/root/Downloads/data/json/world_bank.json") > 16/12/31 22:54:53 INFO json.JSONRelation: Listing > hdfs://localhost:9000/root/Downloads/data/json/world_bank.json on driver > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0 stored as > values in memory (estimated size 212.4 KB, free 212.4 KB) > 16/12/31 22:54:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored > as bytes in memory (estimated size 19.6 KB, free 232.0 KB) > 16/12/31 22:54:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on localhost:39844 (size: 19.6 KB, free: 511.1 MB) > 16/12/31 22:54:54 INFO spark.SparkContext: Created broadcast 0 from json at > NativeMethodAccessorImpl.java:-2 > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark/python/pyspark/sql/readwriter.py", line 176, in json > return self._df(self._jreader.json(path)) > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line > 813, in __call__ > File "/opt/spark/python/pyspark/sql/utils.py", line 45, in deco > return f(*a, **kw) > File "/opt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, > in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o19.json. > : java.io.IOException: No input paths specified in job > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1129) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1127) > at > org.apache.spark.sql.execution.datasources.json.InferSchema$.infer(InferSchema.scala:65) > at > org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:114) > at > org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$4.apply(JSONRelation.scala:109) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:109) > at > org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:108) > at > org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636) > at > org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635) > at > org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:37) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) > at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at