Thanks everyone for some great ideas. I will give these a try and see which one fits the best into what I am doing.
Have a great day! Billy On Thu, Apr 27, 2017 at 10:27 AM, Martin Packer <[email protected]> wrote: > Presumably it's related to "Beautiful Soup" - which is nice and liberal > when it comes to parsing HTML and XML. > > Cheers, Martin > > Martin Packer, > zChampion, Principal Systems Investigator, > Worldwide Cloud & Systems Performance, IBM > > +44-7802-245-584 > > email: [email protected] > > Twitter / Facebook IDs: MartinPacker > > Blog: > https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker > > Podcast Series (With Marna Walle): https://developer.ibm.com/tv/mpt/ or > > https://itunes.apple.com/gb/podcast/mainframe-performance- > topics/id1127943573?mt=2 > > > > From: "Barkow, Eileen" <[email protected]> > To: [email protected] > Date: 27/04/2017 14:51 > Subject: Re: How to pull webpage into batch job > Sent by: IBM Mainframe Discussion List <[email protected]> > > > > Thank you Andrew for the info about Jsoup - I had never heard of it. > > the jar files to compile and run can be downloaded from: > > > > https://jsoup.org/download > > > > api is at: > > > > https://jsoup.org/apidocs/ > > > > -----Original Message----- > From: IBM Mainframe Discussion List [mailto:[email protected]] On > Behalf Of Andrew Rowley > Sent: Thursday, April 27, 2017 3:19 AM > To: [email protected] > Subject: Re: How to pull webpage into batch job > > > > I would suggest Java as well. There are open source libraries that can > > do the HTML parsing too e.g. Jsoup. > > > > I just tested this example on z/OS, it worked (fetch the Wikipedia home > > page and list items from the In the news section): > > > > import java.io.IOException; > > import org.jsoup.Jsoup; > > import org.jsoup.nodes.Document; > > import org.jsoup.nodes.Element; > > import org.jsoup.select.Elements; > > > > public class JsoupTest { > > public static void main(String[] args) throws IOException { > > Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); > > Elements newsHeadlines = doc.select("#mp-itn li"); > > for (Element e : newsHeadlines) { > > System.out.println(e.text()); > > } > > } > > } > > > > -- > > Andrew Rowley > > Black Hill Software > > +61 413 302 386 > > > > ---------------------------------------------------------------------- > > For IBM-MAIN subscribe / signoff / archive access instructions, > > send email to [email protected]<mailto:[email protected]> > with the message: INFO IBM-MAIN > > ________________________________ > > This e-mail, including any attachments, may be confidential, privileged or > otherwise legally protected. It is intended only for the addressee. If you > received this e-mail in error or from someone who was not authorized to > send it to you, do not disseminate, copy or otherwise use this e-mail or > its attachments. Please notify the sender immediately by reply e-mail and > delete the e-mail from your system. > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN > -- Thank you and best regards, *Billy Ashton* ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
