http://git-wip-us.apache.org/repos/asf/oodt/blob/4066b63b/webapp/fmprod/src/site/xdoc/tutorial/index.xml ---------------------------------------------------------------------- diff --git a/webapp/fmprod/src/site/xdoc/tutorial/index.xml b/webapp/fmprod/src/site/xdoc/tutorial/index.xml new file mode 100755 index 0000000..3cb2a71 --- /dev/null +++ b/webapp/fmprod/src/site/xdoc/tutorial/index.xml @@ -0,0 +1,659 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one or more contributor +license agreements. See the NOTICE.txt file distributed with this work for +additional information regarding copyright ownership. The ASF licenses this +file to you under the Apache License, Version 2.0 (the "License"); you may not +use this file except in compliance with the License. You may obtain a copy of +the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +License for the specific language governing permissions and limitations under +the License. +--> +<document> + <properties> + <title>Understanding the XMLQuery</title> + <author email="[email protected]">Sean Kelly</author> + </properties> + + <body> + <section name="Understanding the XMLQuery"> + <p>Apache OODT's <a href="../../profile/">profile servers</a>, <a + href="../../product/">product servers</a>, and other + components all use the same format for a query. It's + encapsulated by the class + <code>org.apache.oodt.xmlquery.XMLQuery</code>. In this tutorial, + we'll look at this class and see how it represents queries. + You'll need this knowledge both to make queries to OODT + servers, as well as to understand queries coming into OODT + servers. + </p> + </section> + + <section name="Basic Query Concepts"> + <p>Capturing various aspects of a query is difficult to do in + general, and OODT's implementation is not stellar or complete. + But, it has proved succesful in a variety of applications, so + let's see what concepts it encapsulates. + </p> + + <subsection name="XML?"> + <p>First, forget the fact that the XMLQuery has "XML" in its + name. It doesn't mean you can query only XML resources. + It's called XMLQuery probably because the person who came up + with it thought XML was pretty cool, or that you can + represent an OODT query in XML format. + </p> + + <p>While you <em>can</em> represent an XMLQuery in XML, you + usually only use the Java representation, that is, you + create and manipulate Java objects of the class + <code>org.apache.oodt.xmlquery.XMLQuery</code>. + </p> + </subsection> + + <subsection name="Generic Queries"> + <p>In theory, the XMLQuery can represent <em>any</em> query + for information. It captures generic aspects of a query, + such as the domain of the question being posed, the range in + which the desired response should be formulated, and + constraints on what selects the response. In XMLQuery + parlance, we call these the "from element set" (domain), the + "select element set" (range), and the "where element set" + (constraints). + </p> + + <p>In practice, none of the current OODT implementations use + any but the "where element set." And indeed, for most + problems presented to OODT, that is sufficient. However, + the framework is there to support more aspects of a query, + and you're welcome to use them in your own deployments. + </p> + </subsection> + + <subsection name="Query Metadata"> + <p>The XMLQuery concept captures metadata about a query as + well, such as the title for the query, whether the query + itself is secret or classified, how many results to return + at most, how to propagate the query through a network, and + so forth. In practice, though, none of these additional + attributes are used in current deployments of OODT. + Moreover, none of the current OODT components obey such + settings such as maximum number of results or propagation + types. + </p> + + <p>As a result, you should ignore these aspects of the + XMLQuery and merely use its default values. We'll see these + shortly. + </p> + </subsection> + + <subsection name="XMLQuery Structure"> + <p>The following diagram shows the XMLQuery and related classes (note the diagram is outdated; "jpl.eda.xmlquery" + should read "org.apache.oodt.xmlquery"):</p> + + <p><img src="../images/xmlquery.png" alt="Class diagram of XMLQuery"/></p> + + <p>A single <code>XMLQuery</code> object has three separate + lists of <code>QueryElement</code> objects, representing the + "from", "select", and "where" element sets. In practice, + the "from" and "where" sets are empty, though, as mentioned. + There's also a single <code>QueryHeader</code> object + capturing query metadata. Within the <code>XMLQuery</code> + itself is additional query metadata. Finally, there's + exactly one <code>QueryResult</code> object which captures + the results of the query so far. + </p> + </subsection> + </section> + + <section name="Boolean Expressions"> + <p>The XMLQuery class uses lists of <code>QueryElement</code> + objects to represent its "from", "select", and "where" element + sets. The lists form a postfix boolean stack, with the + zeroth element of the list being the top of the stack. + Although you can populate these stacks by manipulating their + corresponding <code>java.util.List</code>s, the XMLQuery class + provides a boolean expression language that lets you directly + populate them. + </p> + + <p>The XMLQuery class also respects that some queries just + cannot be formulated as a boolean expression. In these cases, + you can pass in a string that the XMLQuery will otherwise + carry unparsed. Note that your profile and product servers + will then have the responsibility of handling that string in + some appropriate way. + </p> + + <subsection name="Query Language"> + <p>The query language that XMLQuery uses to generate postfix + boolean stacks is a series of infix, not postfix, + element-and-value expression linked by boolean operators. + Here's an example: + </p> + + <source>temperature > 36 AND latitude < 45</source> + + <p>As you can see, these are <em>triples</em> linked in a + logical expression. Each triple has the form + (<var>element</var>, <var>relation</var>, <var>literal</var>). + For example, the first triple has <var>element</var> = + <code>temperature</code>, <var>relation</var> = GT + (greater-than), and <var>literal</var> = 36. That triple is + linked to the next one with the boolean <code>AND</code> + operator. + </p> + + <p>The full set relation operators include: <code>=</code> (EQ), + <code>!=</code> (NE), <code><</code> (LT), + <code><=</code> (LE), <code>></code> (GT), + <code>>=</code> (GE), <code>LIKE</code>, and + <code>NOTLIKE</code>. The logical operators include + <code>AND</code>, <code>&</code>, <code>OR</code>, + <code>|</code>, <code>NOT</code>, and <code>!</code>. You can + use parenthesis to group things too. + </p> + + <p>Here are a few more examples:</p> + + <source>specimen = Blood +bac > 0.05 AND priors = 3 +surname LIKE 'Simspon%' OR numChildren <= 3 AND RETURN = numEpisodes</source> + </subsection> + + <subsection name="Expression Stacks"> + <p>The "where" element set is actually a + <code>java.util.List</code> of + <code>org.apache.oodt.xmlquery.QueryElement</code> objects, arranged + in a boolean stack with the top of the stack as the zeroth + element in the list. <code>QueryElement</code> objects + themselves have two attributes, a role and a value. + </p> + + <p>The role tells what role the <code>QueryElement</code> is + playing. It can be <code>elemName</code> for the + <var>element</var> part of a triple, <code>RELOP</code> for + the <var>relation</var> part of a triple, <code>LITERAL</code> + for the <var>literal</var> part of a triple, or + <code>LOGOP</code> for a logical operator linking triples + together. The value tells what the element is, what the + relational operator is, what literal value is being related, + or what the logical operator is. + </p> + + <p>The <code>XMLQuery</code> parses a query expression and + generates a corresponding stack of <code>QueryElement</code>s. + Let's look at a couple examples. The expression + </p> + <source>latitude > 45</source> + <p>generates the "where" stack</p> + <img src="../images/small-stack.png" alt="Stack of three query elements"/> + + <p>While the expression</p> + <source>artist = Bach AND NOT album = Poem OR track != Aria</source> + <p>generates the "where" stack</p> + <p><img src="../images/large-stack.png" alt="Stack of a lot of query elements"/></p> + </subsection> + + <subsection name="The RETURN Element"> + <p>A special element is reserved by XMLQuery: + <code>RETURN</code>. It's used to indicate what to select, + and so any value specified with <code>RETURN</code> goes + into the "select" set, not the "where" set. + </p> + + <p>Moreover, the <code>RETURN</code> element doesn't pay + attention to how it's linked with boolean expressions in the + rest of query, or what relational operator is used with the + literal value being returned. For example, that means + <em>all</em> of the following expressions would generate + <em>identical</em> XMLQueries: + </p> + + <source>specimen = Blood AND RETURN = volume +specimen = Blood OR RETURN = volume +specimen = Blood AND RETURN != volume +specimen = Blood AND RETURN < volume +specimen = Blood AND RETURN LIKE volume</source> + + <p>All <code>QueryElements</code> from RETURN triples would go + into the "select" instead of the "where" set. + </p> + </subsection> + </section> + + <section name="Constructing a Query"> + <p>To construct a query, you'll use a Java constructor of the + following form: + </p> + <source>XMLQuery(String keywordQuery, String id, String title, + String desc, String ddId, String resultModeId, String propType, + String propLevels, int maxResults, java.util.List mimeAccept, + boolean parseQuery)</source> + + <p>The parameters are summarized below:</p> + + <table> + <thead> + <tr> + <th>Parameter</th> + <th>Purpose</th> + <th>Sample values</th> + </tr> + </thead> + <tbody> + <tr> + <td>keywordQuery</td><td>A string representing your query + expression, in the query language described above, or in + some other application-sepcific + language.</td><td><code>numDonuts = 3</code>, + <code>select volume_remaining from specimens where + specimen_type = 4</code></td> + </tr> + <tr> + <td>id</td><td>An identifier for your query</td> + <td>query-1, 1.3.6.1.1316.4.1, myQuery, urn:ibm:sys:0x39ad930a</td> + </tr> + <tr> + <td>title</td><td>A title for your query</td> + <td>My First Query, Query for Blood Specimens, Simpson's Query</td> + </tr> + <tr> + <td>desc</td><td>Description of the query</td> + <td>H.J. Simpson is looking for donut shops</td> + </tr> + <tr> + <td>ddId</td><td>Data dictionary ID. This identifies the + data dictionary that provides definitions for the elements + used in the query like "specimen" or "numDonuts". It's + not used by any current OODT deployment or the OODT + framework.</td><td><code>null</code></td> + </tr> + <tr> + <td>resultModeId</td> <td>Identifies what to return from + the query. Defaults to <code>ATTRIBUTE</code>. Not used + by any current OODT deployment or the OODT + framework.</td><td><code>null</code></td> + </tr> + <tr> + <td>propType</td><td>How to propagate the query, defaults + to <code>BROADCAST</code>. It's not used by any current + OODT deployment or the OODT framework.</td><td><code>null</code></td> + </tr> + <tr> + <td>propLevels</td> <td>How far to propagate the query, + defaults to <code>N/A</code>. Not used by any current + OODT deployment or the OODT + framework.</td><td><code>null</code></td> + </tr> + <tr> + <td>maxResults</td> + <td>At most how many results to return; not enforced by OODT framework.</td> + <td>1, 100, <code>Integer.MAX_VALUE</code>, -6</td> + </tr> + <tr> + <td>mimeAccept</td> <td>List of acceptable MIME types for + returned products, defaults to <code>*/*</code></td><td><code>List types = new ArrayList(); types.add("text/xml"); types.add("text/html"); types.add("text/*");</code></td> + </tr> + <tr> + <td>parseQuery</td><td>Should the class parse the query as + a boolean expression? True says to generate the boolean + expression stacks. False says to just save the expression + string.</td> + <td><code>true</code>, <code>false</code></td> + </tr> + </tbody> + </table> + + <p>All of the values above can be set to <code>null</code> to + use a default or non-specific value (except for + <code>maxResults</code> and <code>parseQuery</code>, which are + <code>int</code> and <code>boolean</code> types and can't be + assigned <code>null</code>). For most applications, using + <code>null</code> is perfectly acceptable. Since the OODT + framework doesn't use <code>maxResults</code>, you can use any + value. However, specific profile servers' and product + servers' query handlers may pay attention to value if so + programmed. + </p> + + <subsection name="Parsed or Unparsed Queries"> + <p>The last parameter, <code>parseQuery</code>, tells if you + want the <code>XMLQuery</code> class to parse your query and + generate boolean expression stacks (discussed above) or not. + Set to <code>true</code>, the class will parse the string as + if in the XMLQuery language described above, and will generate + the "from", "select", and "where" element boolean stacks. Set + it to <code>false</code> and the class won't parse the string + or generate the stacks. It will instead store the string for + later use by a profile server's or product server's query + handler. + </p> + + <p>For example, if you pass in the XML query language + expression,</p> + + <source>donutsEaten > 5 AND RETURN = episodeNumber</source> + + <p>then set the <code>parseQuery</code> + flag to <code>true</code>. As another example, suppose the + query expression is + </p> + + <source>select episodeNumber from episodes where donutsEaten > 5</source> + + <p>This is an SQL expression, probably targeted to a product + server than can handle SQL expressions. In this case, set + <code>parseQuery</code> to false. + </p> + + <p>The current OODT deployments for the Planetary Data System + and the Early Detection Research Network both use + <em>parsed</em> queries. + </p> + </subsection> + + <subsection name="Acceptable MIME Types"> + <p>Internet standards for mail, web, and other applications + use <abbr title='Multipurpose Internet Mail + Extensions'>MIME</abbr> types (described in <a + href="ftp://ftp.rfc-editor.org/in-notes/rfc2046.txt">RFC-2046</a> + amongst other documents) to describe the content and media + type of data. So does OODT. When you construct an + <code>XMLQuery</code>, you can also pass in a list of MIME + types that are acceptable to you for the format of any + returned products, much in the same way your web browser + tells a web server what media types it can display. + </p> + + <p>The list of acceptable MIME types is only used for product + queries since products can come in any shape and flavor. + Profile queries ignore the list; profiles are always + returned as a list of Java + <code>org.apache.oodt.profile.Profile</code> objects. + </p> + + <p>You've probably seen MIME types before, but here are some + examples in case you haven't: + </p> + + <ul> + <li><code>text/plain</code> - a plain old text file</li> + <li><code>text/html</code> - a hypertext document</li> + <li><code>image/jpeg</code> - a picture in the JPEG/JFIF format</li> + <li><code>image/gif</code> - a picture in the GIF format</li> + <li><code>audio/mpeg</code> - an audio file, probably in the MP3 format</li> + <li><code>video/mpeg</code> - a video file, probably in the MP2 format</li> + <li><code>application/msword</code> - a Micro$oft Word document</li> + <li><code>application/octet-stream</code> - binary data</li> + </ul> + + <p>In the <code>XMLQuery</code> constructor, you can pass in a + list of MIME types that shows your <em>preference</em> for + returned products. Product servers' query handlers examine + the query to see if they can provide a matching product, + <em>and</em> they examine the list of MIME types to see if + they can provide matching products in the format you desire. + </p> + + <p>As an example, suppose you create a MIME type list as follows:</p> + + <source>List acceptableTypes = new ArrayList(); +acceptableTypes.add("image/tiff"); +acceptableTypes.add("image/png"); +acceptableTypes.add("image/jpeg");</source> + + <p>and you pass <code>acceptableTypes</code> as the + <code>mimeAccept</code> parameter of the + <code>XMLQuery</code> constructor. This tells query + handlers receiving your query that you'd really prefer a + TIFF format image. However, failing that, you'll accept a + PNG format image. And, as a last resort, a JPEG will do. + </p> + + <p>You can also use wildcards in your MIME types. Suppose we + did the following:</p> + + <source>List acceptableTypes = new ArrayList(); +acceptableTypes.add("image/tiff"); +acceptableTypes.add("image/png"); +acceptableTypes.add("image/*");</source> + + <p>Now we tell query handlers in product servers that we + really prefer TIFF format images. If a query handler can't + do that, then a PNG format will be OK. And if a query + handler can't do PNG, then <em>any</em> image format will be + fine, even loathesome GIF. + </p> + + <p>If you pass a <code>null</code> or an empty list in the + <code>mimeAccept</code> parameter, the OODT framework will + convert into a single item list: <code>*/*</code>, meaning + any format is acceptable. + </p> + </subsection> + </section> + + <section name='"Running" XMLQuery'> + <p>The <code>XMLQuery</code> class is also an executable class. + By running it from the command-line, you can see how it + generates its XML representation. It also lets you pass in a + file containing an XML representation of an XMLQuery and + parses it for validity. + </p> + + <p>Let's try just seeing that XML representation. (In these + examples, we'll be using a Unix <code>csh</code> like + command environment. Other shells and non-Unix users will + have to adjust.) + </p> + + <subsection name='Collecting the Components'> + <p>First up, we'll need two components:</p> + + <ul> + <li><a href="../../commons/">OODT Common Components</a>. This + is needed by all of OODT software; it contains general + utilities for starting servers, parsing XML, logging, and + more. + </li> + <li><a href="../../xmlquery/">OODT Query Expression</a>. This + contains the <code>XMLQuery</code> and related classes. + </li> + </ul> + + <p>Download the binary distribution of each of these packages + and extract their contents. Then, create a single directory + and collect the jar files together in one place. + </p> + + </subsection> + + <subsection name='Generating the Query'> + <p>To generate the query, pass the command-line argument + <code>-expr</code>. That tells the XMLQuery that the rest + of the command line is the query expression. It will expect + it to be in the XMLQuery query language (meaning that it + will create an <code>XMLQuery</code> object with + <code>parseQuery</code> set to <code>true</code>). + </p> + + <p>Here's an example:</p> + + <source>% <b>java -Djava.ext.dirs=. \ + org.apache.oodt.xmlquery.XMLQuery \ + -expr donutsEaten \> 5 AND RETURN = episodeNumber</b> +kwdQueryString: donutsEaten > 5 AND RETURN = episodeNumber +fromElementSet: [] +results: org.apache.oodt.xmlquery.QueryResult[list=[]] +whereElementSet: +[org.apache.oodt.xmlquery.QueryElement[role=elemName,value=donutsEaten], +org.apache.oodt.xmlquery.QueryElement[role=LITERAL,value=5], +org.apache.oodt.xmlquery.QueryElement[role=RELOP,value=GT]] +selectElementSet: +[org.apache.oodt.xmlquery.QueryElement[role=elemName,value=episodeNumber]] +======doc string======= +<?xml version="1.0" encoding="UTF-8"?> +<query> . . .</source> + + <p>The program prints out some fields of the XMLQuery such as + the "from" element set, the current results (which should + always be empty since we haven't passed this query to any + product servers), the "where" element set, and the "select" + element set. It then prints out the XML representation. + </p> + + <p>If you examine the XML representation closely, you'll see + things like the list of acceptable MIME types: + </p> + <source><queryMimeAccept>*/*</queryMimeAccept></source> + <p>This says that any type is acceptable. You'll also see the + passed in query string:</p> + <source><queryKWQString>donutsEaten &gt; 5 AND + RETURN = episodeNumber</queryKWQString></source> + + <p>Regardless of whether you passed <code>true</code> or + <code>false</code> in the <code>parseQuery</code> parameter, + the <code>XMLQuery</code> always saves the original query + string. For unparsed queries, this is how the string is + packaged on its way to a product server. For parsed + queries, product servers will use the boolean stacks. + (Since this was a parsed query, you'll also see the boolean + stacks in XML format if you look closely. They're there.) + </p> + </subsection> + </section> + + <section name='Getting Results'> + <p>Alert readers will have noticed that the results of a query + have a place in <code>XMLQuery</code> objects. This actually + applies to product queries only. After sending an + <code>XMLQuery</code> to a product server, the query object + comes back adorned with zero or more matching results. You + then access the <code>XMLquery</code> object methods to + retrieve those results. + </p> + + <p>The following class diagram demonstrates the relationship (again, the diagram is outdated; + "jpl.eda.xmquery" should read "org.apache.oodt.xmlquery"):</p> + + <p><img src='../images/results.png' alt='Result class diagram'/></p> + + <p>As you can see, a single query has a single + <code>org.apache.oodt.xmlquery.QueryResult</code>, which contains a + <code>java.util.List</code> of + <code>org.apache.oodt.xmlquery.Result</code> objects. + <code>Result</code> objects may have zero or more + <code>Header</code>s, and <code>Result</code> objects may + actually be <code>LargeResult</code> objects. + </p> + + <p>To retrieve the list of <code>Result</code> objects, call the + <code>XMLQuery</code>'s <code>getResults</code> method, which + returns the <code>java.util.List</code> directly. + </p> + + <p>Each result also includes</p> + <ul> + <li>An identifier. In the case there's more than one matching + results, this identifier (a string) should be unique amongst + results. + </li> + <li>A MIME type. This tells you what format the matching product is in.</li> + <li>A profile ID. This is currently unused.</li> + <li>A resource ID. This is also unused.</li> + + <li>A validity period. This is the number of milliseconds for + which the product is considered valid. You can use this + information to decide how long to cache the product within + your own program before having to retrieve it again. + </li> + + <li>A flag indicating whether the product is classified. + Classified or secret products shouldn't be cached or should + otherwise be handled carefully by your application program. + </li> + </ul> + + <subsection name='Result Headers'> + <p>The headers of a result are optional. They're used for + tabular style results to indicate column headings. Each + <code>Header</code> object captures three strings, a name, a + data type, and units. + </p> + + <p>For example, suppose you retrieved a product that was a + table of temperatures at various locations on the Earth. + There might be three headers in the headers list: + </p> + + <table> + <thead> + <tr> + <th rowspan="2">List Index</th> + <th colspan="3">Header</th> + </tr> + <tr> + <th>Name</th> + <th>Data Type</th> + <th>Units</th> + </tr> + </thead> + <tbody> + <tr><td>0</td><td>latitude</td><td>float</td><td>degrees</td></tr> + <tr><td>1</td><td>longitude</td><td>float</td><td>degrees</td></tr> + <tr><td>2</td><td>temperatuer</td><td>float</td><td>kelvins</td></tr> + </tbody> + </table> + + <p>Suppose the product you get back as a picture of a tissue + specimen. In this case, there would be <em>no</em> headers. + </p> + </subsection> + + <subsection name='Getting the Product Data'> + <p>To retrieve the actual data comprising your product, call + the <code>Result</code> object's <code>getInputStream</code> + method. This returns a standard + <code>java.io.InputStream</code> that lets you access the + data. How you interpret that data, though, depends on the + MIME type of the product, which you can get by calling the + <code>Result</code>'s <code>getMIMEType</code> method. + </p> + + <p>For example, if the MIME type was <code>text/plain</code>, + then the byte stream would be a sequence of Unicode + characters. If it were <code>image/jpeg</code>, then the + bytes would be image data in JPEG/JFIF format. + </p> + </subsection> + </section> + + <section name='Conclusion'> + <p>In this tutorial, we learned about the structure of the + standard query component in OODT, the <code>XMLQuery</code>. + We saw the query language that XMLQuery supports and how it + generates postfix boolean expression stacks. You can also + encode any query expression by using a special constructor + argument that tells XMLQuery to not parse the query string. + We also execute the <code>XMLQuery</code> class directly. + Finally, we saw how product data is embedded in the XMLQuery + and how to deal with such results. + </p> + + <p>As a client of the OODT framework, you can now create + <code>XMLQuery</code> objects to query product servers from + within your Java applications. As a server in the framework, + you know how to deal with incoming query objects. + </p> + </section> + </body> +</document> +
http://git-wip-us.apache.org/repos/asf/oodt/blob/4066b63b/webapp/fmprod/src/site/xdoc/tutorials/index.xml ---------------------------------------------------------------------- diff --git a/webapp/fmprod/src/site/xdoc/tutorials/index.xml b/webapp/fmprod/src/site/xdoc/tutorials/index.xml new file mode 100755 index 0000000..e130ea9 --- /dev/null +++ b/webapp/fmprod/src/site/xdoc/tutorials/index.xml @@ -0,0 +1,36 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<document> + <properties> + <title>Product Service Tutorials</title> + <author email="[email protected]">Sean Kelly</author> + </properties> + + <body> + <section name="Product Service Tutorials"> + <p>The following tutorials are available:</p> + <ul> + <li><a href="./ps">Your First Product Service</a></li> + <li><a href="./qh">Developing a Query Handler</a></li> + <li><a href="./lh">Serving Large Products</a></li> + </ul> + </section> + </body> +</document> + + http://git-wip-us.apache.org/repos/asf/oodt/blob/4066b63b/webapp/fmprod/src/site/xdoc/tutorials/lh/index.xml ---------------------------------------------------------------------- diff --git a/webapp/fmprod/src/site/xdoc/tutorials/lh/index.xml b/webapp/fmprod/src/site/xdoc/tutorials/lh/index.xml new file mode 100755 index 0000000..cb3de01 --- /dev/null +++ b/webapp/fmprod/src/site/xdoc/tutorials/lh/index.xml @@ -0,0 +1,718 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<document> + <properties> + <title>Serving Large Products</title> + <author email="[email protected]">Sean Kelly</author> + </properties> + + <body> + <section name="Serving Large Products"> + <p>In the <a href="../qh/">last tutorial</a>, we created a query + handler and "installed" it in a product server. We could + query it for products (mathematical constants) using the + XMLQuery's postfix boolean stacks. The handler would return + results by embedding them in the returned XMLQuery. Now we'll + return larger products that live outside of the XMLQuery. + </p> + </section> + + <section name="What's Large?"> + <p>There's a <a + href="http://www.worldslargestthings.com/california/clam.htm">giant + clam</a> at Pismo Beach, a <a + href="http://www.worldslargestthings.com/kansas/cawkercity.htm">giant + ball of twine</a> in Kansas, and for those who drive SUVs, a + <a + href="http://www.worldslargestthings.com/missouri/gaspump.htm">giant + gas pump</a>. For the OODT framework, large is similarly hard to define. + </p> + + <p>One of the original architects of the OODT framework thought + that putting a products result in with the query meant that + you'd never lose the separation between product and the query + that generated it. I'm not sure I see the value in that, but + regardless, it posed a practical challenge: an + <code>XMLQuery</code> object in memory with one or two large + results in it will exhaust the Java virtual machine's available + memory. + </p> + + <p>It's even worse in when the XMLQuery is expressed as a + textual XML document. In this case, a binary product must be + encoded in a text format (we use <a + href="ftp://ftp.rfc-editor.org/in-notes/rfc2045.txt">Base64</a>), + making the XMLQuery in XML format even larger than as a Java + object. Moreover, those XML documents must be parsed at some + time to reconstitute them as Java objects. We use a DOM-based + parser, which holds the entire document in memory. Naturally, + things tend to explode at this rate. + </p> + + <p>There is a way out of the quagmire, though. Instead of + writing a <code>QueryHandler</code>, write a + <code>LargeProductQueryHandler</code>. A + <code>QueryHandler</code> puts <code>Result</code> objects + into the <code>XMLQuery</code> which hold the entire product. + A <code>LargeProductQueryHandler</code> puts + <code>LargeResult</code> objects which hold <em>a reference to + the product</em>. + </p> + </section> + + <section name="Large Handlers and Large Results"> + <p>The OODT framework provides an extension to the + <code>QueryHandler</code> interface called + <code>jpl.eda.product.LargeProductQueryHandler</code>. This + interface adds two methods that you must implement: + </p> + + <ul> + <li><code>retrieveChunk</code>. This method returns a byte + array representing a chunk of the product. The OODT + framework calls this method repeatedly to gather chunks of + the product for the product client. It takes a <em>product + ID</em> (a string) that identifies which product is being + retrieved. It also takes an byte offset into the product + data and a size of the byte chunk to return. You return the + matching chunk. + </li> + + <li><code>close</code>. This method is called by the OODT + framework to tell the query handler it's done getting a + product. It takes a <em>product ID</em> that tells which + product is no longer being retrieved. You use this method + to perform any cleanup necessary. + </li> + </ul> + + <p>Because it extends the <code>QueryHandler</code> interface, + you still have to implement the <code>query</code> method. + However, as a <code>LargeProductQueryHandler</code>, you can + add <code>LargeResult</code> objects to the + <code>XMLQuery</code> passed in. <code>LargeResult</code>s + identify the <em>product ID</em> (string) that the OODT + framework will later use when it calls + <code>retrieveChunk</code> and <code>close</code>. + </p> + + <p>For example, suppose you're serving large images by + generating them from various other data sources: + </p> + + <ol> + <li>The <code>query</code> method would examine the user's + query, consult the various data sources, and generate the + image, storing it in a temporary file. It would also assign + a string <em>product ID</em> to this file, use that product + ID in a <code>LargeResult</code> object, add the + <code>LargeResult</code> to the <code>XMLQuery</code>, and + return the modified <code>XMLQuery</code>. + </li> + + <li>Shortly afterward, the OODT framework will repeatedly call + the <code>retrieveChunk</code> method. This method would + check the <em>product ID</em> passed in and locate the + corresponding temporary file generated earlier by the + <code>query</code> method. It would index into the file by + the offset requested by the framework, read the number of + bytes requested by the framework, package that up into a + byte array, and return it. Eventually, the OODT framework + will have read the entire product this way. + </li> + + <li>Lastly, the OODT framework will call the + <code>close</code> method. This method would check the + <em>product ID</em> and locate and delete the temporary + file. + </li> + </ol> + + <p>To put this into practice, let's create a + <code>LargeProductQueryHandler</code> that serves files out of + the product server's filesystem. + </p> + </section> + + <section name="Writing the Handler"> + <p>We'll develop a <code>FileHandler</code> that will serve + files out of the product server's filesystem. Providing + filesystem access through the OODT framework in this way is + probably not a very good idea (after all, product clients + could request copies of sensitive files), but for a + demonstration it'll do. + </p> + + <p>Because files can be quite large, we'll use a + <code>LargeProductQueryHandler</code>. It will serve queries + of the form + </p> + + <p><code>file = <var>path</var></code></p> + + <p>where <var>path</var> is the full path of the file the user + wants. The handler will add <code>LargeResult</code>s to the + XMLQuery, and the <em>product ID</em> will just simply be the + <var>path</var> of the requested file. The + <code>retrieveChunk</code> method will open the file with the + given product ID (which is just the path to the file) and + return a block of data out of it. The <code>close</code> + method won't need to do anything, since we're not creating + temporary files or making network conncetions or anything; + there's just nothing to clean up. + </p> + + <subsection name="Getting the Path"> + <p>First, let's create a utility method that takes the + <code>XMLQuery</code> and returns a <code>java.io.File</code> + that matches the requested file. Because the query takes the form + </p> + + <p><code>file = <var>path</var></code></p> + + <p>there should be three <code>QueryElement</code>s on the "where" stack:</p> + + <ol> + <li>The zeroth (topmost) has role = <code>elemName</code> + and value = <code>file</code>. + </li> + <li>The first (middle) has role = <code>LITERAL</code> and + value = the <var>path</var> of the file the user wants. + </li> + <li>The last (bottom) has role = <code>RELOP</code> and + value = <code>EQ</code>. + </li> + </ol> + + <p>We'll reject any other query by returning <code>null</code> + from this method. Further, if the file named by the + <var>path</var> doesn't exist, or if it's not a file (for + example, it's a directory or a socket), we'll return <code>null</code>. + </p> + + <p>Here's the start of our <code>FileHandler.java</code>:</p> + + <source>import java.io.File; +import java.util.List; +import jpl.eda.product.LargeProductQueryHandler; +import jpl.eda.xmlquery.QueryElement; +import jpl.eda.xmlquery.XMLQuery; +public class FileHandler + implements LargeProductQueryHandler { + private static File getFile(XMLQuery q) { + List stack = q.getWhereElementSet(); + if (stack.size() != 3) return null; + QueryElement e = (QueryElement) stack.get(0); + if (!"elemName".equals(e.getRole()) + || !"file".equals(e.getValue())) + return null; + e = (QueryElement) stack.get(2); + if (!"RELOP".equals(e.getRole()) + || !"EQ".equals(e.getValue())) + return null; + e = (QueryElement) stack.get(1); + if (!"LITERAL".equals(e.getRole())) + return null; + File file = new File(e.getValue()); + if (!file.isFile()) return null; + return file; + } +}</source> + </subsection> + <subsection name="Checking the MIME Type"> + <p>Recall that the user can say what MIME types of products + are acceptable by specifying the preference list in the + XMLQuery. This lets a product server that serves, say, + video clips, convert them to <code>video/mpeg</code> + (MPEG-2), <code>video/mpeg4-generic</code> (MPEG-4), + <code>video/quicktime</code> (Apple Quicktime), or some + other format, in order to better serve its clients. + </p> + + <p>Since our product server just serves <em>files of any + format</em>, we won't really bother with the list of + acceptable MIME types. After all, the + <code>/etc/passwd</code> file <em>could</em> be a JPEG + image on some systems. (Yes, we could go through the + extra step of determining the MIME type of a file by + looking at its extension or its contents, but this is an + OODT tutorial, not a something-else-tutorial!) + </p> + + <p>However, we will honor the user's wishes by labeling the + result's MIME type based on what the user specifies in the + acceptable MIME type list. So, if the product client says + that <code>image/jpeg</code> is acceptable and the file is + <code>/etc/passwd</code>, we'll call + <code>/etc/passwd</code> a JPEG image. However, we won't + try to read the client's mind: if the user wants + <code>image/*</code>, then we'll just say it's a binary + file, <code>application/octet-stream</code>. + </p> + + <p>Here's the code:</p> + + <source>import java.util.Iterator; +... +public class FileHandler + implements LargeProductQueryHandler { + ... + private static String getMimeType(XMLQuery q) { + for (Iterator i = q.getMimeAccept().iterator(); + i.hasNext();) { + String t = (String) i.next(); + if (t.indexOf('*') == -1) return t; + } + return "application/octet-stream"; + } +}</source> + </subsection> + + <subsection name="Inserting the Result"> + <p>Once we've got the file that the user wants and the MIME + type to call it, all we have to do is insert the + <code>LargeResult</code>. Remember that it's the + <code>LargeResult</code> that tells the OODT framework what + the <em>product ID</em> is for later + <code>retrieveChunk</code> and <code>close</code> calls. + The <em>product ID</em> is passed as the first argument to + the <code>LargeResult</code> constructor. + </p> + + <p>We'll write a utility method to insert the <code>LargeResult</code>:</p> + + <source>import java.io.IOException; +import java.util.Collections; +import jpl.eda.xmlquery.LargeResult; +... +public class FileHandler + implements LargeProductQueryHandler { + ... + private static void insert(File file, String type, + XMLQuery q) throws IOException { + String id = file.getCanonicalPath(); + long size = file.length(); + LargeResult lr = new LargeResult(id, type, + /*profileID*/null, /*resourceID*/null, + /*headers*/Collections.EMPTY_LIST, size); + q.getResults().add(lr); + } +}</source> + + </subsection> + + <subsection name='Handling the Query'> + <p>With our three utility methods in hand, writing the + required <code>query</code> method is a piece of cake. Here + it is: + </p> + + <source>import jpl.eda.product.ProductException; +... +public class FileHandler + implements LargeProductQueryHandler { + ... + public XMLQuery query(XMLQuery q) + throws ProductException { + try { + File file = getFile(q); + if (file == null) return q; + String type = getMimeType(q); + insert(file, type, q); + return q; + } catch (IOException ex) { + throw new ProductException(ex); + } + } +}</source> + + <p>The <code>query</code> method as defined by the + <code>QueryHandler</code> interface (and extended into the + <code>LargeProductQueryHandler</code> interface) is allowed + to throw only one kind of checked exception: + <code>ProductException</code>. So, in case the + <code>insert</code> method throws an + <code>IOException</code>, we transform it into a + <code>ProductException</code>. + </p> + + <p>Now there are just two more required methods to implement, + <code>retrieveChunk</code> and <code>close</code>. + </p> + </subsection> + + <subsection name='Blowing Chunks'> + <p>The OODT framework repeatedly calls handler's + <code>retrieveChunk</code> method to get chunks of the + product, evenutally getting the entire product (unless the + product client decides to abort the transfer). For our file + handler, retrieve chunk just has to + </p> + <ol> + <li>Make sure the file specified by the <em>product ID</em> + still exists (after all, it could be deleted at any time, + even before the first <code>retrieveChunk</code> got + called). + </li> + <li>Open the file.</li> + <li>Skip into the file by the requested offset.</li> + <li>Read the requested number of bytes out of the file.</li> + <li>Return those bytes.</li> + <li>Close the file.</li> + </ol> + + <p>We'll write a quick little <code>skip</code> method to skip + into a file's input stream: + </p> + + <source>private static void skip(long offset, + InputStream in) throws IOException { + while (offset > 0) + offset -= in.skip(offset); +}</source> + + <p>And here's another little utility method to read a + specified number of bytes out of a file's input stream: + </p> + + <source>private static byte[] read(int length, + InputStream in) throws IOException { + byte[] buf = new byte[length]; + int numRead; + int index = 0; + int toRead = length; + while (toRead > 0) { + numRead = in.read(buf, index, toRead); + index += numRead; + toRead -= numRead; + } + return buf; +}</source> + + <p>(By now, you're probably wondering why we just didn't use + <code>java.io.RandomAccessFile</code>; I'm wondering that + too!)</p> + + <p>Finally, we can implement the required + <code>retrieveChunk</code> method: + </p> + + <source>import java.io.BufferedInputStream; +import java.io.FileInputStream; +... +public class FileHandler + implements LargeProductQueryHandler { + ... + public byte[] retrieveChunk(String id, long offset, + int length) throws ProductException { + BufferedInputStream in = null; + try { + File f = new File(id); + if (!f.isFile()) throw new ProductException(id + + " isn't a file (anymore?)"); + in = new BufferedInputStream(new FileInputStream(f)); + skip(offset, in); + byte[] buf = read(length, in); + return buf; + } catch (IOException ex) { + throw new ProductException(ex); + } finally { + if (in != null) try { + in.close(); + } catch (IOException ignore) {} + } + } +}</source> + + </subsection> + + <subsection name='Closing Up'> + <p>Because the OODT framework has no idea what data sources a + <code>LargeProductQueryHandler</code> will eventually + consult, what temporary files it may need to clean up, what + network sockets it might need to shut down, and so forth, it + needs some way to indicate to a query handler that's it's + done calling <code>retrieveChunk</code> for a certain + <em>product ID</em>. The <code>close</code> method does this. + </p> + + <p>In our example, <code>close</code> doesn't need to do + anything, but we are obligated to implement it: + </p> + + <source>... +public class FileHandler + implements LargeProductQueryHandler { + ... + public void close(String id) {} +}</source> + </subsection> + + <subsection name='Complete Source Code'> + <p>Here's the complete source file, <code>FileHandler.java</code>:</p> + <source>import java.io.BufferedInputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.InputStream; +import java.io.IOException; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import jpl.eda.product.LargeProductQueryHandler; +import jpl.eda.product.ProductException; +import jpl.eda.xmlquery.LargeResult; +import jpl.eda.xmlquery.QueryElement; +import jpl.eda.xmlquery.XMLQuery; + +public class FileHandler + implements LargeProductQueryHandler { + private static File getFile(XMLQuery q) { + List stack = q.getWhereElementSet(); + if (stack.size() != 3) return null; + QueryElement e = (QueryElement) stack.get(0); + if (!"elemName".equals(e.getRole()) + || !"file".equals(e.getValue())) + return null; + e = (QueryElement) stack.get(2); + if (!"RELOP".equals(e.getRole()) + || !"EQ".equals(e.getValue())) + return null; + e = (QueryElement) stack.get(1); + if (!"LITERAL".equals(e.getRole())) + return null; + File file = new File(e.getValue()); + if (!file.isFile()) return null; + return file; + } + private static String getMimeType(XMLQuery q) { + for (Iterator i = q.getMimeAccept().iterator(); + i.hasNext();) { + String t = (String) i.next(); + if (t.indexOf('*') == -1) return t; + } + return "application/octet-stream"; + } + private static void insert(File file, String type, + XMLQuery q) throws IOException { + String id = file.getCanonicalPath(); + long size = file.length(); + LargeResult lr = new LargeResult(id, type, + /*profileID*/null, /*resourceID*/null, + /*headers*/Collections.EMPTY_LIST, size); + q.getResults().add(lr); + } + public XMLQuery query(XMLQuery q) + throws ProductException { + try { + File file = getFile(q); + if (file == null) return q; + String type = getMimeType(q); + insert(file, type, q); + return q; + } catch (IOException ex) { + throw new ProductException(ex); + } + } + private static void skip(long offset, + InputStream in) throws IOException { + while (offset > 0) + offset -= in.skip(offset); + } + private static byte[] read(int length, + InputStream in) throws IOException { + byte[] buf = new byte[length]; + int numRead; + int index = 0; + int toRead = length; + while (toRead > 0) { + numRead = in.read(buf, index, toRead); + index += numRead; + toRead -= numRead; + } + return buf; + } + public byte[] retrieveChunk(String id, long offset, + int length) throws ProductException { + BufferedInputStream in = null; + try { + File f = new File(id); + if (!f.isFile()) throw new ProductException(id + + " isn't a file (anymore?)"); + in = new BufferedInputStream(new FileInputStream(f)); + skip(offset, in); + byte[] buf = read(length, in); + return buf; + } catch (IOException ex) { + throw new ProductException(ex); + } finally { + if (in != null) try { + in.close(); + } catch (IOException ignore) {} + } + } + public void close(String id) {} +}</source> + </subsection> + </section> + + <section name='Compiling the Code'> + <p>We'll compile this code using the J2SDK command-line tools, + but if you're more comfortable with some kind of Integrated + Development Environment (IDE), adjust as necessary. + </p> + + <p>Let's go back again to the <code>$PS_HOME</code> directory we + made earlier; create the file + <code>$PS_HOME/src/FileHandler.java</code> with the contents + shown above. Then, compile and update the jar file as follows: + </p> + + <source>% <b>javac -extdirs lib \ + -d classes src/FileHandler.java</b> +% <b>ls -l classes</b> +total 8 +-rw-r--r-- 1 kelly kelly 2524 25 Feb 15:46 ConstantHandler.class +-rw-r--r-- 1 kelly kelly 3163 26 Feb 16:15 FileHandler.class +% <b>jar -uf lib/my-handlers.jar \ + -C classes FileHandler.class</b> +% <b>jar -tf lib/my-handlers.jar</b> +META-INF/ +META-INF/MANIFEST.MF +ConstantHandler.class +FileHandler.class</source> + + <p>We've now got a jar with the <code>ConstantHandler</code> + from the <a href="../qh/">last tutorial</a> and our new + <code>FileHandler</code>. + </p> + </section> + + <section name='Specifying and Running the New Query Handler'> + <p>The <code>$PS_HOME/bin/ps</code> script already has a system + property specifying the <code>ConstantHandler</code>, so we + just need to add the <code>FileHandler</code> to that list. + </p> + + <p>First, stop the product server by hitting CTRL+C (or your + interrupt key) in the window in which it's currently running. + Then, modify the <code>$PS_HOME/bin/ps</code> script to read + as follows: + </p> + + <source>#!/bin/sh +exec java -Djava.ext.dirs=$PS_HOME/lib \ + -Dhandlers=ConstantHandler,FileHandler \ + jpl.eda.ExecServer \ + jpl.eda.product.rmi.ProductServiceImpl \ + urn:eda:rmi:MyProductService</source> + + <p>Then start the server by running + <code>$PS_HOME/bin/ps</code>. If all goes well, the product + server will be ready to answer queries again, this time + passing each incoming <code>XMLQuery</code> to <em>two</em> + different query handlers. + </p> + + <p>Edit the <code>$PS_HOME/bin/pc</code> script once more to + make sure the <code>-out</code> and not the <code>-xml</code> + command-line argument is being used. Let's try querying for a + file: + </p> + + <source>% <b>$PS_HOME/bin/pc "file = /etc/passwd"</b> +nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false +root:*:0:0:System Administrator:/var/root:/bin/sh +daemon:*:1:1:System Services:/var/root:/usr/bin/false +...</source> + + <p>If you like, you can change the <code>-out</code> to + <code>-xml</code> again and examine the XML version. This + time, the product data isn't in the XMLQuery object. + </p> + </section> + + <section name="What's the Difference?"> + <p>On the client side, the interface to get product results in + <code>LargeResult</code>s versus regular <code>Result</code>s + is identical. The client calls <code>getInputStream</code> to + get a binary stream to read the product data. + </p> + + <p>There is a speed penalty for large results. What + <code>Result.getInputStream</code> returns is an input stream + to product data already contained in the XMLQuery. It's a + stream to a buffer already in the client's address space, so + it's nice and fast. + </p> + + <p><code>LargeResult</code> overrides the + <code>getInputStream</code> method to instead return an input + stream that repeatedly makes calls back to the product + server's <code>retrieveChunk</code> method. Since the product + is <em>not</em> already in the local address space of the + client, getting large products is a bit slower. To + compensate, the input stream actually starts a background + thread to start retrieving chunks of the product ahead of the + product client, up to a certain point (we don't want to run + out of memory again). + </p> + + <p>On the server side, the difference is in programming + complexity. Creating a <code>LargeProductQueryHandler</code> + requires implementing three methods instead of just one. You + may have to clean up temporary files, close network ports, or + do other cleanup. You may even have to guard against clients + that present specially-crafted product IDs that try to + circumvent access controls to products. + </p> + + <p><code>LargeResult</code>s are more general, and will work for + any size product, from zero bytes on up. And you can even mix + and match: a <code>LargeProductQueryHandler</code> can add + regular <code>Result</code>s to an XMLQuery as well as + <code>LargeResult</code>s. You might program some logic that, + under a certain threshold, to return regular + <code>Result</code>s for small sized products, and + <code>LargeResult</code>s for anything bigger than small. + </p> + </section> + + <section name='Conclusion'> + <p>In this tutorial, we implemented a + <code>LargeProductQueryHandler</code> that served large + products. In this case, large could mean zero bytes (empty + products) up to gargantuan numbers of bytes. This handler + queried for files in the product server's filesystem, which is + a bit insecure so you might want to terminate the product + server as soon as possible. We also learned that what the + advantages and disadvantages were between regular product + results and large product results, and that + <code>LargeProductQueryHandler</code>s can use + <code>LargeResult</code> objects in addition to regular + <code>Result</code> objects. + </p> + + <p>If you've also completed the <a href="../ps">Your First + Product Service</a> tutorial and the <a + href="../qh/">Developing a Query Handler</a> tutorial, you + are now a master of the OODT Product Service. + Congratulations! + </p> + </section> + </body> +</document> http://git-wip-us.apache.org/repos/asf/oodt/blob/4066b63b/webapp/fmprod/src/site/xdoc/tutorials/ps/index.xml ---------------------------------------------------------------------- diff --git a/webapp/fmprod/src/site/xdoc/tutorials/ps/index.xml b/webapp/fmprod/src/site/xdoc/tutorials/ps/index.xml new file mode 100755 index 0000000..91bfc00 --- /dev/null +++ b/webapp/fmprod/src/site/xdoc/tutorials/ps/index.xml @@ -0,0 +1,481 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<document> + <properties> + <title>Your First Product Service</title> + <author email="[email protected]">Sean Kelly</author> + </properties> + + <body> + <section name="Your First Product Service"> + <p>This tutorial introduces starting a basic product server. + This product server will be capable of accepting queries, but + will not actually respond with any data. By completing this + tutorial, you'll have a working product server in which you + can install more complex logic to actually handle product + requests. + </p> + </section> + + <section name="The Product Service"> + <p>The OODT Product Service is a remotely accessible software + component that enables you to retrieve products, which can be + any kind of data. In OODT, a <em>product client</em> passes a + <em>query</em> into a known product server. The product + server delegates that query to its installed <em>query + handlers</em>; each one gets a chance at satisfying the query + with requested data. Query handlers are the interfaces between + the generic OODT framework and your system-specific data + stores. They have the job of understanding the passed in + query, finding or even synthesizing the matching product, + applying conversions to Internet-standard formats, and + returning results. The product service then collects all the + matching products from the query handlers and returns them to + the product client. + </p> + + <p>To deploy a product server, you need to come up with query + handlers that interface to your site- or discipline-specific + data stores. Then, start a product server and inform it of + what query handlers to use. + </p> + + <subsection name="Delegation Model"> + <p>The OODT product service <em>delegates</em> incoming + queries to zero or more query handlers. In the case of zero + query handlers, the product service always replies to every + query with "zero matches." Otherwise, the query handlers + get a chance to satisfy the query, and may or may not add + matching products to the result. + </p> + + <p>The following class diagram demonstrates this delegation model:</p> + + <img src="../../images/delegation.png" alt="Delegation Class Diagram" /> + + <p>Here, a product client calls a server to process a query + for products. The server delegates to query handlers, which + are Java objects that implement the + <code>QueryHandler</code> interface. Two query handlers in + this diagram, <code>MyHandler</code> and + <code>MyOtherHandler</code> can both try to satisfy the + query by adding matching products to the total result. They + can each add more than one matching product, just one, or + none at all. The server then returns the matches, if any, + to the client. + </p> + </subsection> + + <subsection name="Large Products"> + <p>In OODT, a query contains its matching products. When a + client passes a query to a product server, the query object + returns to the client <em>with matching products embedded in + it</em>. This can, however, make query objects too large to + be comfortably passed around a network of OODT services (query + objects must reside completely in memory). In this case, a + special extension of a <code>QueryHandler</code>, a + <code>LargeProductQueryHandler</code>, can instead place a + <em>reference</em> to the product data in the query. + </p> + <p>To product clients, the difference is invisible: the + product data is still accessed from the query object the + same way. As a developer of product services, though, you + may need to decide which kind of query handler to make: + regular or large. + </p> + </subsection> + + <subsection name="Communicating with a Product Service"> + <p>The product service is a remotely accessible object. + Therefore, product clients access it with a remote object + access protocol. Currently, OODT supports RMI and CORBA. You + can also access product services with HTTP; in this case, a + proxy object provides the HTTP interface while internally it + accesses a product service with RMI or CORBA. + </p> + + <p>For this tutorial, we'll use RMI because it's enormously + less complex than CORBA. + </p> + </subsection> + </section> + + <section name="Making the Staging Area"> + <p>To start a product service, we'll create a directory + structure that will hold software components (jar files) as + well as scripts that will simplify the usually over-long Java + command lines. (Note that these examples are for Mac OS X, + Linux, or other Unix-like systems. Windows users will have to + adapt.) + </p> + + <p>Let's start by making a directory hierarchy for our product + service called <code>ps</code> (this example uses a C-style + shell <code>csh</code>, if you're using <code>bash</code> or + another style shell, substitute the appropriate commands). + </p> + + <source>% <b>mkdir ps</b> +% <b>cd ps</b> +% <b>setenv PS_HOME `pwd`</b> +% <b>mkdir bin lib</b> +% <b>ls -RF $PS_HOME</b> +bin/ lib/ + +/Users/kelly/tmp/ps/bin: + +/Users/kelly/tmp/ps/lib:</source> + + <p>Note that we're using an environment variable + <code>PS_HOME</code> to contain the path of the directory + we're using to hold everything. We'll use this environment + variable as we develop the scripts to launch the product service. + </p> + </section> + + <section name="The RMI Registry"> + <p>Since we're using Remote Method Invocation (RMI) for this + tutorial, we'll need to start an RMI Registry. An RMI + Registry serves as a catalog that maps between named objects, + such as your product server, to the appropriate network + address and port where the object can be located. Your + product client will use the RMI registry to locate the product + server so it can connect to the product server and communicate with it. + </p> + + <subsection name="Collecting the RMI Registry Components"> + <p>To start an RMI Registry, you'll need the following components:</p> + + <ul> + <li><a href="/edm-commons/">EDM Common Components</a>. + These are common utilities used by every OODT + service.</li> <li><a href="/grid-product/">Grid Product + Service</a>. This is the product service, product client, + query handler interface, and related classes.</li> <li><a + href="/rmi-registry/">OODT RMI Registry</a>. This is the + actual RMI registry.</li> + </ul> + + <p>Download each component's binary distribution, unpack each + one, and take collect the jar files into the + <code>lib</code> directory. For example: + </p> + + <source>% <b>cp /tmp/edm-commons-2.2.5/*.jar $PS_HOME/lib</b> +% <b>cp /tmp/grid-product-3.0.3/*.jar $PS_HOME/lib</b> +% <b>cp /tmp/rmi-registry-1.0.0/*.jar $PS_HOME/lib</b> +% <b>ls -l $PS_HOME/lib</b> +total 312 +-rw-r--r-- 1 kelly kelly 149503 24 Feb 14:06 edm-commons-2.2.5.jar +-rw-r--r-- 1 kelly kelly 120844 24 Feb 14:07 grid-product-3.0.3.jar +-rw-r--r-- 1 kelly kelly 8055 24 Feb 14:07 rmi-registry-1.0.0.jar</source> + </subsection> + + <subsection name="Writing the RMI Script"> + <p>To keep from having to type long Java command lines, we'll + create a simple shell script that will start the RMI + registry. We'll call it <code>rmi-reg</code> and stick it + in the <code>bin</code> directory. + </p> + + <p>Here's the <code>rmi-reg</code> script:</p> + <source>#!/bin/sh +exec java -Djava.ext.dirs=$PS_HOME/lib \ + gov.nasa.jpl.oodt.rmi.RMIRegistry</source> + + <p>This script tells the Java virtual machine to find + extension jars in the directory <code>$PS_HOME/lib</code>. It + then says that the main class to execute is + <code>gov.nasa.jpl.oodt.rmi.RMIRegistry</code>. + </p> + + <p>Go ahead and make this script executable and start the RMI + Registry. In another window (with the appropriate setting of + <code>PS_HOME</code>), run + <code>$PS_HOME/bin/rmi-reg</code>. You should see output + similar to the following: + </p> + + <source>% <b>chmod 755 $PS_HOME/bin/rmi-reg</b> +% <b>$PS_HOME/bin/rmi-reg</b> +Thu Feb 24 14:10:25 CST 2005: no objects registered</source> + + <p>The RMI Registry is now running. Every two minutes it will + display an update of all registered objects. Naturally, we + don't have any product service running right now, so it will + say <code>no objects registered</code>. Go ahead and ignore + this window for now. It's time to start our product server. + </p> + </subsection> + </section> + + <section name="The Product Server"> + <p>With an RMI Registry in place, we're ready to start our + product server. As with the RMI Registry, we'll need the + software components and to make a script to launch it. + </p> + + <subsection name="Collecting the Product Server Components"> + <p>We already have two of the components needed to start the + product server, <code>edm-commons</code> and + <code>grid-product</code>. We need two more: + </p> + + <ul> + <li><a href="/edm-query/">EDM Query Expression</a>. This + component encapsulates the implementation of an OODT query + and also contains some product retrieval utilities.</li> + <li><a href="http://ws.apache.org/xmlrpc">Apache + XML-RPC</a>. This is used internally by OODT services. + Download version 1.1, not a later version! If you prefer, + you can <a + href="http://ibiblio.org/maven/xmlrpc/jars/xmlrpc-1.1.jar">fetch + the jar file directly</a>.</li> + </ul> + + <p>As before, put these jars into the + <code>$PS_HOME/lib</code> directory: + </p> + + <source>% <b>ls -l $PS_HOME/lib</b> +total 376 +-rw-r--r-- 1 kelly kelly 149503 24 Feb 14:06 edm-commons-2.2.5.jar +-rw-r--r-- 1 kelly kelly 43879 24 Feb 14:35 edm-query-2.0.2.jar +-rw-r--r-- 1 kelly kelly 120844 24 Feb 14:07 grid-product-3.0.3.jar +-rw-r--r-- 1 kelly kelly 8055 24 Feb 14:07 rmi-registry-1.0.0.jar +-rw-r--r-- 1 kelly kelly 53978 24 Feb 14:35 xmlrpc-1.1.jar</source> + </subsection> + + <subsection name="Writing the Product Server Script"> + <p>To launch the product server, we'll create a script called + <code>ps</code> in the <code>$PS_HOME/bin</code> directory. + Here's its contents: + </p> + + <source>#!/bin/sh +exec java -Djava.ext.dirs=$PS_HOME/lib \ + jpl.eda.ExecServer \ + jpl.eda.product.rmi.ProductServiceImpl \ + urn:eda:rmi:MyProductService</source> + + <p>Like with the RMI server, this tells Java where to find + extension jars (<code>$PS_HOME/lib</code>). The main class + is <code>jpl.eda.ExecServer</code>, this is a framework + class from <code>edm-commons</code> that provides basic + start-up functions for a variety of services. In this case, + the service is + <code>jpl.eda.product.rmi.ProductServiceImpl</code>; this is + the name of the class that provides the RMI version of the + OODT product service. We then pass in one final + command-line argument, + <code>urn:eda:rmi:MyProductService</code>. This names the product service. + </p> + </subsection> + + <subsection name="What's in a Name?"> + <p>The product service registers itself using a name provided + on the command-line, in this case, + <code>urn:eda:rmi:MyProductService</code>. Let's take apart + the name and see how it works. + </p> + + <p>If you're familiar with web standards, you can see that the + name is a Uniform Resource Name (URN), since it starts with + <code>urn:</code>. The OODT Framework uses URNs to identify + services and other objects. The <code>eda:</code> tells + that the name is part of the Enterprise Data Architecture + (EDA) namespace. (EDA was the name of a project related to + OODT that was merged with OODT. For now, just always use + <code>eda:</code> in your URNs.) + </p> + + <p>Next comes <code>rmi:</code>. This is a special flag for + the OODT services that tells that we're using a name of an + RMI-accessible object. The OODT framework will know to use + an RMI Registry to register the server. + </p> + + <p>Finally is <code>MyProductService</code>. This is the + actual name used in the RMI Registry. You can call your + product service anything you want. For example, suppose you + have three product servers; one in the US, one in Canada, + and one in Australia. You might name them: + </p> + + <ul> + <li><code>urn:eda:rmi:US</code></li> + <li><code>urn:eda:rmi:Canada</code></li> + <li><code>urn:eda:rmi:Australia</code></li> + </ul> + + <p>Or you might prefer to use ISO country codes. Or you might + name them according to the kinds of products they serve, + such as <code>urn:eda:rmi:Biomarkers</code> or + <code>urn:eda:rmi:BusniessForecasts</code>. + </p> + + <p>The RMI Registry will happily re-assign a name if one's + already in use, so when deploying your own product servers, + be sure to give each one a unique name. + </p> + </subsection> + + <subsection name="Launching the Product Server"> + <p>Make the <code>ps</code> script executable and start the + product server at this time. Do this in a separate window + with the appropriate setting of <code>PS_HOME</code>: + </p> + + <source>% <b>chmod 755 $PS_HOME/bin/ps</b> +% <b>$PS_HOME/bin/ps</b> +Object context ready; delegating to: [jpl.eda.object.jndi.RMIContext@94257f]</source> + + <p>The product service is now running and ready to accept + product queries. Since we didn't tell it what query + handlers to use, it will always respond with zero matching + products. That may not be interesting, but it's a good test + to see if we can at least launch a product server. Now, + let's launch a product client and query it. + </p> + </subsection> + </section> + + <section name="Querying the Product Server"> + <p>To query the product server, we use a product client. The + Java class <code>jpl.eda.product.ProductClient</code> provides + the API for your own programs to query for and retrieve + products. But it's also an executable class, so we can run it + from the command-line in order to test our product server. + However, let's again make a script to make invoking it a bit + easier. + </p> + + <p>We'll call the script <code>pc</code> for "product client," + and it will take a single command line argument, which will be + the <em>query expression</em> to pass into the product server. + Query expressions define the constraints on the kinds of + products we want to achieve. Since the product server we've + set up will always respond with zero products, though, we can + pass in any syntactically valid query expression. + </p> + + <p>Here's the script:</p> + + <source>#!/bin/sh +if [ $# -ne 1 ]; then + echo "Usage: `basename $0` <query-expression>" 1>&2 + exit 1 +fi + +exec java -Djava.ext.dirs=$PS_HOME/lib \ + jpl.eda.product.ProductClient \ + -out \ + urn:eda:rmi:MyProductService \ + "$1"</source> + + + <p>This script checks to make sure there's exactly one + command-line argument, the query expression. If there isn't, + it prints a helpful usage message to the standard error + stream, as is Unix tradition. Otherwise, it will execute the + <code>jpl.eda.product.ProductClient</code> class with Java. + When executed, this class expects three command-line arguments: + </p> + + <ol> + <li><code>-out</code> or <code>-xml</code>. OODT uses XML to + represent the query that it passes to and receives back from + a product server. With <code>-out</code>, the product + client will write to the standard output the raw product + data. With <code>-xml</code>, you'll instead see the XML + representation of the query (with any embedded matching + products) instead. + </li> + + <li>The name of the product service to contact. In this case, + we're using the one we started earlier, registered under the + name <code>urn:eda:rmi:MyProductService</code>. + </li> + + <li>The query expression.</li> + </ol> + + <p>Now we can make this script executable and run it:</p> + + <source>% <b>chmod 755 $PS_HOME/bin/pc</b> +<b>$PS_HOME/bin/pc "x = 3"</b> +Object context ready; delegating to: [jpl.eda.object.jndi.RMIContext@c79809] +No matching results</source> + + <p>Although not terribly exciting, this is good news. Here's + what happened: + </p> + + <ol> + <li>The product client created a query object (of class + <code>jpl.eda.xmlquery.XMLQuery</code> from the + <code>edm-query</code> component) out of the string query + <code>x = 3</code>. + </li> + + <li>It asked the RMI Registry to tell it where (network + address) it could find the product service named + <code>MyProductService</code>. + </li> + + <li>After getting the response back from the RMI Registry, it + then contacted the product service over a network connection + (even if to the same local system) and asked it to handle + the query, passing the query object. + </li> + + <li>The product service, having no query handlers to which to + delegate, merely returned the query object unmodified over + the network connection. + </li> + + <li>The product client, having no product to write to the + standard output (as indicated by the <code>-out</code> + argument), wrote the diagnostic message <code>No matching + results</code>. + </li> + </ol> + + <p>You can make this example slightly more interesting by + changing the <code>-out</code> in the <code>pc</code> script + to <code>-xml</code>. Now, when you run it, you'll see an XML + document describing the query. One of the pertinent sections + to note is: + </p> + + <source>...<queryResultSet/>...</source> + + <p>This empty XML element means that there were no results.</p> + </section> + + <section name="Conclusion"> + <p>By following this tutorial, you've started both an RMI + Registry and a basic product server. You've queried that + product server to insure that you can communicate with it. In + later tutorials, you'll build on this product server by adding + a query handler to it and returning actual product data. + </p> + </section> + + </body> +</document>
