Hi Jinan,
There are some examples for XML here,
https://github.com/databricks/spark-xml/blob/master/src/test/java/com/databricks/spark/xml/JavaXmlSuite.java
for test codes.
Or, you can see documentation in README.md.
https://github.com/databricks/spark-xml#java-api.
There are other basic Java examples here,
https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples
.
Basic steps are explained well in a book, Learning Spark (you can just
google it).
I also see this is explained well in official document here,
http://spark.apache.org/docs/latest/programming-guide.html.
I hope this can help
Thanks!
2016-04-18 9:37 GMT+09:00 jinan_alhajjaj :
> Hello,
> I would like to know how to parse XML files using Apache spark by java
> language. I am doing this for my senior project and I am a beginner in
> Apache Spark and I have just a little experience with spark.
> Thank you.
> On Apr 18, 2016, at 3:14 AM, user-h...@spark.apache.org wrote:
>
> Hi! This is the ezmlm program. I'm managing the
> user@spark.apache.org mailing list.
>
> Acknowledgment: I have added the address
>
> j.r.alhaj...@hotmail.com
>
> to the user mailing list.
>
> Welcome to user@spark.apache.org!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>
>
> To remove your address from the list, send a message to:
>
>
> Send mail to the following for info and FAQ for this list:
>
>
>
> Similar addresses exist for the digest list:
>
>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
>
> To get an index with subject and author for messages 123-456 , mail:
>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
>