Github user queeniema commented on a diff in the pull request:
https://github.com/apache/incubator-quarks-website/pull/13#discussion_r56918421
--- Diff: site/docs/recipe_source_function.md ---
@@ -0,0 +1,86 @@
+---
+title: Recipe 2. Writing a Source Function
+---
+In the previous [Hello Quarks!](recipe_hello_quarks) example, we create a
data source which only generates a single Java String and prints it to output.
Yet Quarks sources support the ability generate any data type as a source, not
just primitive Java types such as Strings. Moreover, because the user supplies
the code which generates the data, the user has complete flexibility for *how*
the data is generated. This recipe demonstrates how a user could write such a
custom data source.
+
+## Custom Source: Reading the Lines of a Web Page
+{{site.data.alerts.note}} Quarks' API provides convenience methods for
performing HTTP requests. For the sake of example we are writing a HTTP data
source manually, but in principle there are easier methods.
{{site.data.alerts.end}}
+
+One example of a custom data source could be retrieving the contents of a
web page and printing each line to output. For example, the user could be
querying the Yahoo Finance website for the most recent stock price data of Bank
of America, Cabot Oil & Gas, and Freeport-McMoRan Inc:
+
+``` java
+ public static void main(String[] args) throws Exception {
+ DirectProvider dp = new DirectProvider();
+ Topology top = dp.newTopology();
+
+ final URL url = new
URL("http://finance.yahoo.com/d/quotes.csv?s=BAC+COG+FCX&f=snabl");
+ }
+```
+
+Given the correctly formatted URL to request the data, we can use the
*Topology.source* method to generate each line of the page as a data item on
the stream. *Topology.source* takes a Java Supplier that returns an Iterable.
The supplier is invoked once, and the items returned from the Iterable are used
as the stream's data items. For example, the following *queryWebsite* method
returns a supplier which queries a URL and returns an Iterable of its contents:
+
+``` java
+ private static Supplier<Iterable<String> > queryWebsite(URL url)
throws Exception{
+ return () -> {
+ List<String> lines = new LinkedList<>();
+ try {
+ InputStream is = url.openStream();
+ BufferedReader br = new BufferedReader(
+ new InputStreamReader(is));
+
+ for(String s = br.readLine(); s != null; s = br.readLine())
+ lines.add(s);
+
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ return lines;
+ };
+ }
+```
+
+ When invoking *Topology.source*, we can use *queryWebsite* to return the
required supplier, passing in the URL.
+
+ ``` java
+ public static void main(String[] args) throws Exception {
+ DirectProvider dp = new DirectProvider();
+ Topology top = dp.newTopology();
+
+ final URL url = new
URL("http://finance.yahoo.com/d/quotes.csv?s=BAC+COG+FCX&f=snabl");
+
+ TStream<String> linesOfWebsite = top.source(queryWebsite(url));
+}
--- End diff --
Indentation is a bit off here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---