Hi Jinan,

There are some examples for XML here,
https://github.com/databricks/spark-xml/blob/master/src/test/java/com/databricks/spark/xml/JavaXmlSuite.java
for test codes.

Or, you can see documentation in README.md.
https://github.com/databricks/spark-xml#java-api.


There are other basic Java examples here,
https://github.com/apache/spark/tree/master/examples/src/main/java/org/apache/spark/examples
.


Basic steps are explained well in a book, Learning Spark (you can just
google it).


I also see this is explained well in official document here,
http://spark.apache.org/docs/latest/programming-guide.html.


I hope this can help


Thanks!



2016-04-18 9:37 GMT+09:00 jinan_alhajjaj <j.r.alhaj...@hotmail.com>:

> Hello,
> I would like to know how to parse XML files using Apache spark by java
> language.  I am doing this for my senior project and I am a beginner in
> Apache Spark and I have just a little experience with spark.
>  Thank you.
> On Apr 18, 2016, at 3:14 AM, user-h...@spark.apache.org wrote:
>
> Hi! This is the ezmlm program. I'm managing the
> user@spark.apache.org mailing list.
>
> Acknowledgment: I have added the address
>
>   j.r.alhaj...@hotmail.com
>
> to the user mailing list.
>
> Welcome to user@spark.apache.org!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the user list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>   <user-subscr...@spark.apache.org>
>
> To remove your address from the list, send a message to:
>   <user-unsubscr...@spark.apache.org>
>
> Send mail to the following for info and FAQ for this list:
>   <user-i...@spark.apache.org>
>   <user-...@spark.apache.org>
>
> Similar addresses exist for the digest list:
>   <user-digest-subscr...@spark.apache.org>
>   <user-digest-unsubscr...@spark.apache.org>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>   <user-get.123_...@spark.apache.org>
>
> To get an index with subject and author for messages 123-456 , mail:
>   <user-index.123_...@spark.apache.org>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>   <user-thread.12...@spark.apache.org>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> <user-subscribe-john=host.dom...@spark.apache.org>
>
> To stop subscription for this address, mail:
> <user-unsubscribe-john=host.dom...@spark.apache.org>
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> user-ow...@spark.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: <j.r.alhaj...@hotmail.com>
> Received: (qmail 84366 invoked by uid 99); 18 Apr 2016 00:14:49 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO
> spamd4-us-west.apache.org) (209.188.14.142)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Apr 2016 00:14:49
> +0000
> Received: from localhost (localhost [127.0.0.1])
> by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org)
> with ESMTP id AC11BC0D0C
> for <
> user-sc.1460937887.ganmfjokmmhahlokbknk-j.r.alhajjaj=hotmail....@spark.apache.org>;
> Mon, 18 Apr 2016 00:14:48 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: -0.722
> X-Spam-Level:
> X-Spam-Status: No, score=-0.722 tagged_above=-999 required=6.31
> tests=[RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01,
> RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001]
> autolearn=disabled
> Received: from mx1-lw-eu.apache.org ([10.40.0.8])
> by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port
> 10024)
> with ESMTP id I6zkynvOTY46
> for <
> user-sc.1460937887.ganmfjokmmhahlokbknk-j.r.alhajjaj=hotmail....@spark.apache.org
> >;
> Mon, 18 Apr 2016 00:14:46 +0000 (UTC)
> Received: from BLU004-OMC2S2.hotmail.com (blu004-omc2s2.hotmail.com
> [65.55.111.77])
> by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with
> ESMTPS id D6A625F59E
> for <
> user-sc.1460937887.ganmfjokmmhahlokbknk-j.r.alhajjaj=hotmail....@spark.apache.org>;
> Mon, 18 Apr 2016 00:14:45 +0000 (UTC)
> Received: from BLU437-SMTP95 ([65.55.111.71]) by BLU004-OMC2S2.hotmail.com
> over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008);
> Sun, 17 Apr 2016 17:14:39 -0700
> X-TMN: [2ipc9V0z78DTNqpYmThAoXNTIh2Ptj12]
> X-Originating-Email: [j.r.alhaj...@hotmail.com]
> Message-ID: <blu437-smtp9584cd408e271df488d689d8...@phx.gbl>
> From: jinan_alhajjaj <j.r.alhaj...@hotmail.com>
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
> Subject: Parse XML file
> Date: Mon, 18 Apr 2016 03:14:32 +0300
> To:
> user-sc.1460937887.ganmfjokmmhahlokbknk-j.r.alhajjaj=hotmail....@spark.apache.org
> MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
> X-Mailer: Apple Mail (2.1878.6)
> X-OriginalArrivalTime: 18 Apr 2016 00:14:37.0784 (UTC)
> FILETIME=[4B507580:01D19907]
>
>
>

Reply via email to