Ah. Have you tried Jackson? https://github.com/FasterXML/jackson-dataformat-xml/blob/master/README.md
_____________________________ From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com<mailto:diwakar.dhanusk...@gmail.com>> Sent: Friday, August 19, 2016 9:41 PM Subject: Re: Best way to read XML data from RDD To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Yes . It accepts a xml file as source but not RDD. The XML data embedded inside json is streamed from kafka cluster. So I could get it as RDD. Right now I am using spark.xml XML.loadstring method inside RDD map function but performance wise I am not happy as it takes 4 minutes to parse XML from 2 million messages in a 3 nodes 100G 4 cpu each environment. Sent from Samsung Mobile. -------- Original message -------- From: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Date:20/08/2016 09:49 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com<mailto:diwakar.dhanusk...@gmail.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Cc: Subject: Re: Best way to read XML data from RDD Have you tried https://github.com/databricks/spark-xml ? On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi"<diwakar.dhanusk...@gmail.com<mailto:diwakar.dhanusk...@gmail.com>> wrote: Hi, There is a RDD with json data. I could read json data using rdd.read.json . The json data has XML data in couple of key-value paris. Which is the best method to read and parse XML from rdd. Is there any specific xml libraries for spark. Could anyone help on this. Thanks.