Make every json object a line and then read t as jsonline not as multiline
> Am 19.06.2020 um 14:37 schrieb Chetan Khatri <chetan.opensou...@gmail.com>: > > > All transactions in JSON, It is not a single array. > >> On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner <step...@buckmaster.ca> >> wrote: >> It's an interesting problem. What is the structure of the file? One big >> array? On hash with many key-value pairs? >> >> Stephan >> >>> On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri <chetan.opensou...@gmail.com> >>> wrote: >>> Hi Spark Users, >>> >>> I have a 50GB of JSON file, I would like to read and persist at HDFS so it >>> can be taken into next transformation. I am trying to read as >>> spark.read.json(path) but this is giving Out of memory error on driver. >>> Obviously, I can't afford having 50 GB on driver memory. In general, what >>> is the best practice to read large JSON file like 50 GB? >>> >>> Thanks >> >> >> -- >> Stephan Wehner, Ph.D. >> The Buckmaster Institute, Inc. >> 2150 Adanac Street >> Vancouver BC V5L 2E7 >> Canada >> Cell (604) 767-7415 >> Fax (888) 808-4655 >> >> Sign up for our free email course >> http://buckmaster.ca/small_business_website_mistakes.html >> >> http://www.buckmaster.ca >> http://answer4img.com >> http://loggingit.com >> http://clocklist.com >> http://stephansmap.org >> http://benchology.com >> http://www.trafficlife.com >> http://stephan.sugarmotor.org (Personal Blog) >> @stephanwehner (Personal Account) >> VA7WSK (Personal call sign)