Repository: incubator-reef Updated Branches: refs/heads/master 9defe611d -> fa77cc63c
http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/markdown/reef-examples.md ---------------------------------------------------------------------- diff --git a/website/src/site/markdown/reef-examples.md b/website/src/site/markdown/reef-examples.md new file mode 100644 index 0000000..74f32e0 --- /dev/null +++ b/website/src/site/markdown/reef-examples.md @@ -0,0 +1,135 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +#Further REEF Examples + +- [Running HelloREEF on YARN](#yarn) + - [Prerequisites](#yarn-prerequisites) + - [How to configure REEF on YARN](#yarn-configurations) + - [How to launch HelloReefYarn](#yarn-launch) +- [Running a REEF Webserver: HelloREEFHttp](#http) + - [Prerequisites](#http-prerequisites) + - [HttpServerShellCmdtHandler](#http-server-shell) +- [Task Scheduler: Retaining Evaluators](#task-scheduler) + - [Prerequisites](#task-scheduler-prerequisites) + - [REST API](#task-scheduler-rest-api) + - [Reusing the Evaluators](#task-scheduler-reusing-evaluators) + + +###<a name="yarn"></a>Running HelloREEF on YARN + +REEF applications can be run on multiple runtime environments. Using `HelloReefYarn`, we will see how to configure and launch REEF applications on YARN. + +####<a name="yarn-prerequisites"></a>Prerequisites + +* [You have compiled REEF locally](tutorial.html#install) +* [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) + +####<a name="yarn-configurations"></a>How to configure REEF on YARN + +The only difference between running a REEF application on YARN vs locally is the runtime configuration: + +``` + final LauncherStatus status = DriverLauncher + .getLauncher(YarnClientConfiguration.CONF.build()) + .run(getDriverConfiguration(), JOB_TIMEOUT); +``` + +####<a name="yarn-launch"></a>How to launch HelloReefYarn + +Running `HelloReefYarn` is very similar to running `HelloREEF`: + + yarn jar reef-examples/target/reef-examples-{$REEF_VERSION}-shaded.jar org.apache.reef.examples.hello.HelloREEFYarn + +**Note**: *The path divider may be different for different OS (e.g. Windows uses \\ while Linux uses / for dividers) so change the code as needed.* + +You can see how REEF applications work on YARN environments in [Introduction to REEF](introduction.html). + +###<a name="http"></a>Running a REEF Webserver: HelloREEFHttp + +REEF also has a webserver interface to handle HTTP requests. This webserver can be utilized in many different manners such as in Interprocess Communcation or in conjuction with the REST API. + +To demonstrate a possible use for this interface, `HelloREEFHttp` serves as a simple webserver to execute shell commands requested from user input. The first thing we should do is register a handler to receive the HTTP requests. + +####<a name="http-prerequisites"></a>Prerequisites + +* [You have compiled REEF locally](tutorial.html#install) + +####<a name="http-server-shell"></a>HttpServerShellCmdtHandler + +`HttpServerShellCmdtHandler` implements `HttpHandler` but three methods must be overridden first: `getUriSpecification`, `setUriSpecification`, and `onHttpRequest`. + +- <a name="http-urispecification"></a> +`UriSpecification` defines the URI specification for the handler. More than one handler can exist per application and thus each handler is distinguished using this specification. Since `HelloREEFHttp` defines `UriSpecification` as `Command`, an HTTP request looks like `http://{host_address}:{host_port}/Command/{request}`. + +- <a name="http-onhttprequest"></a> +`onHttpRequest` defines a hook for when an HTTP request for this handler is invoked. + +###<a name="task-scheduler"></a>Retaining Evaluators: Task Scheduler + +Another example is Task scheduler. Getting commands from users using the REST API, it allocates multiple evaluators and submits the tasks. + +It is a basic Task Scheduler example using Reef-webserver. The application receives the task (shell command) list from user and execute the tasks in a FIFO order. + +####<a name="task-scheduler-prerequisites"></a>Prerequisites + +* [You have compiled REEF locally](tutorial.html#install) +* [Running REEF Webserver : HelloREEFHttp](#http) + +####<a name="task-scheduler-rest-api"></a>REST API + +Users can send the HTTP request to the server via URL : + + http://{address}:{port}/reef-example-scheduler/v1 + +And the possible requests are as follows: + +* `/list`: lists all the tasks' statuses. +* `/clear`: clears all the tasks waiting in the queue and returns how many tasks have been removed. +* `/submit?cmd=COMMAND`: submits a task to execute COMMAND and returns the task id. +* `/status?id=ID`: returns the status of the task with the id, "ID". +* `/cancel?id=ID`: cancels the task with the id, "ID". +* `/max-eval?num={num}`: sets the maximum number of evaluators. + +The result of each task is written in the log files - both in the driver's and the evaluators'. + +####<a name="task-scheduler-reusing-evaluators"></a>Reusing the Evaluators + +You can find the method `retainEvaluator()` in SchedulerDriver: + +``` + /** + * Retain the complete evaluators submitting another task + * until there is no need to reuse them. + */ + private synchronized void retainEvaluator(final ActiveContext context) { + if (scheduler.hasPendingTasks()) { + scheduler.submitTask(context); + } else if (nActiveEval > 1) { + nActiveEval--; + context.close(); + } else { + state = State.READY; + waitForCommands(context); + } + } +``` + +When `Task` completes, `EventHandler` for `CompletedTask` event is invoked. An instance of `CompletedTask` is then passed using the parameter to get the `ActiveContext` object from the `CompletedTask`. We can reuse this `Evaluator` by submitting another `Task` to it if there is a task to launch. + +Using the `-retain false` argument disables this functionality and allocates a new evalutor for every task. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/markdown/tang.md ---------------------------------------------------------------------- diff --git a/website/src/site/markdown/tang.md b/website/src/site/markdown/tang.md new file mode 100644 index 0000000..a43cd29 --- /dev/null +++ b/website/src/site/markdown/tang.md @@ -0,0 +1,455 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +#Tang + +Tang is a configuration managment and checking framework that emphasizes explicit documentation and automatic checkability of configurations and applications instead of ad-hoc, application-specific configuration and bootstrapping logic. It supports distributed, multi-language applications, but gracefully handles simpler use cases as well. + +Tang makes use of dependency injection to automatically instantiate applications. Dependency injectors can be thought of as "make for objects" -- given a request for some type of object, and information that explains how dependencies between objects should be resolved, dependency injectors automatically instantiate the requested object and all of the objects it dependes upon. Tang makes use of a few simple wire formats to support remote and even cross-language dependency injection. + +Outline +------- + + * [Motivation](#motivation) + * [Design principles](#design-principles) + * [Tutorial: Getting started](#tutorial-getting-started) + * [Defining configuration parameters](#configuration-parameters) + * [Configuration Modules](#configuration-modules) + * [Injecting objects with getInstance()](#injecting-objects-with-getinstance) + * [Cyclic Injections](#cyclic-injections) + * [Alternative configuration sources](#alternative-configuration-sources) + * [Raw configuration API](#raw-configuration-api) + * [Looking under the hood](#looking-under-the-hood) + * [InjectionPlan](#injectionPlan) + * [ClassHierarchy](#classHierarchy) + + +<a name="motivation"></a>Motivation +============ + +Distributed systems suffer from problems that arise due to complex compositions of software modules and configuration errors. These problems compound over time: best-practice object oriented design dictates that code be factored into independent reusable modules, and today's distributed applications are increasingly expected to run atop multiple runtime environments. This leads application developers to push complexity into configuration settings, to the point where misconfiguration is now a primary cause of unavailability in fault tolerant systems. + +Tang is our attempt to address these problems. It consists of a dependency injection framework and a set of configuration and debugging tools that automatically and transparently bootstrap applications. We have focused on providing a narrow set of primitives that support the full range of design patterns that arise in distributed system development, and that encourage application developers to build their systems in a maintainable and debuggable way. + +Tang leverages existing language type systems, allowing unmodified IDEs such as Eclipse or NetBeans to surface configuration information in tooltips, provide auto-complete of configuration parameters, and to detect a wide range of configuration problems as you edit your code. Since such functionality is surfaced in the tools you are already familiar with, there is no need to install (or learn) additional development software to get started with Tang. Furthermore, we provide a set of sophisticated build time and runtime tools that detect a wide range of common architectural problems and configuration errors. + +This documentation consists of tutorials that present prefered Tang design patterns. By structuring your application according to the patterns we suggest throughout the tutorials, you will allow our static analysis framework, Tint ("Tang Lint"), to detect problematic design patterns and high-level configuration problems as part of your build. These patterns provide the cornerstone for a number of more advanced features, such as interacting with legacy configuration systems, designing for cross-language applications, and multi-tenancy issues, such as secure injections of untrusted application code. To the best of our knowledge, implementing such tools and addressing these real-world implementation constraints would be difficult, or even impossible, atop competing frameworks. + +<a name="design-principles"></a>Design principles +================= + +Tang encourages application developers to specify default implementations and constructor parameters in terms of code annotations and configuration modules. This avoids the need for a number of subtle (and often confusing) dependency injection software patterns, though it does lead to a different approach to dependency injection than other frameworks encourage. + +In the process of building complicated systems built atop Tang, we found that, as the length of configurations that are passed around at runtime increased, it rapidly became impossible to debug or maintain our higher-level applications. In an attempt to address this problem, traditional dependency injection systems actually compound this issue. They encourage the developers of each application-level component to implement hand-written "Modules" that are executed at runtime. Hand-written modules introspect on the current runtime configuration, augment and modify it, and then return a new configuration that takes the new application component into account. + +In other systems, developers interact with modules by invoking ad-hoc builder methods, and passing configurations (in the correct order) from module to module. Modules frequently delgate to each other, either via inheritance or wrappers. This makes it difficult for developers and end-users to figure out which value of a given parameter will be used, or even to figure out why it was (or was not) set. + +Tang provides an alternative called `ConfigurationModule`s: + + +- `Configurations` and `ConfigurationModules` are "just data," and can be read and written in human readable formats. +- Interfaces and configuration parameters are encouraged to specify defaults, significantly shortening the configurations generated at runtime, and making it easy to see what was "strange" about a given run of the application. +- Tang's static analysis and documentation tools sanity check `ConfigurationModule`s, and document their behavior and any extra parameters they export. +- Configuration options can be set at most once. This avoids (or at least detects) situations in which users and application-level code inadvertantly "fight" over the setting of a particular option. + +The last property comes from Tang's use of _monotonic_ set oriented primitives. This allows us to leverage recent theoretical results in commtative data types; particularly CRDTs, and the CALM theorem. Concretely: + +- A large subset of Tang's public API is commutative, which frees application-level configuration and bootstrapping logic from worrying about the order in which configuration sources are processed at runtime. +- Tang can detect configuration and injection problems much earlier than is possible with other approaches. Also, upon detecting a conflict, Tang lists the configuration sources that contributed to the problem. + + +Finally, Tang is divided into a set of "core" primtives, and higher-level configuration "formats". Tang's core focuses on dependency injection and static checking of configurations. The formats provide higher-level configuration languages primitives, such as distributed, cross-language injection, configuration files, and `ConfigurationModule`. Each Tang format imports and/or exports standard Tang `Configuration` objects, which can then be composed with other configuration data at runtime. + +Improvements to these formats are planned, such as command-line tab completion, and improved APIs for extremely complex applications that are built by composing multiple Tang configurations to inject arbitrary object graphs. +Furthermore, Tang formats include documentation facilities, and automatic command line and configuration file parsing. From an end-user perspective, this takes a lot of the guesswork out of configuration file formats. + +Although Tang surfaces a text-based interface for end-users of the applications built atop it, all configuration options and their types are specified in terms of Java classes and annotations. As with the core Tang primitives, this allows the Java compiler to statically check Tang formats for problems such as inconsistent usage of configuration parameters, naming conflicts and so on. This eliminates broad classes of runtime errors. These checks can be run independently of the application's runtime environment, and can find problems both in the Java-level implementation of the system, and with user-provided configuration files. The tools that perform these checks are designed to run as a post-processing step of projects built atop Tang. Like the Java compiler checks, this prevents such errors from making it to production environments. It also prevents such errors from being exposed to application logic or end-users, greatly simplifying applications built atop Tang. + +Taken together, these properties greatly simplify dependency injection in distributed environments. We expect Tang to be used in environments that are dominated by "plugin"-style APIs with many alternative implementations. Tang cleanly separates concerns over configuration management, dependency injection and object implementations, which hides most of the complexity of dependency injection from plugin implementers. It also prevents plugin implementations from inadvertently conflicting with each other or their runtime environements. Such clean semantics are crucial in distributed, heterogeneous environments. + +<a name="tutorial-getting-started"></a>Tutorial: Getting started +========================= + +This tutorial is geared toward people that would like to quickly get started with Tang, or that are modifying an existing Tang application. + +<a name="configuration-parameters"></a>Constructors, @Inject and @Parameter +------------------------ + +Suppose you are implementing a new class, and would like to automatically pass configuration parameters to it at runtime: + + package com.example; + + public class Timer { + private final int seconds; + + public Timer(int seconds) { + if(seconds < 0) { + throw new IllegalArgumentException("Cannot sleep for negative time!"); + } + this.seconds = seconds; + } + + public void sleep() throws Exception { + java.lang.Thread.sleep(seconds * 1000); + } + } + +Tang encourages applications to use Plain Old Java Objects (POJOs), and emphasizes the use of immutable state for configuration parameters. This reduces boiler plate (there is no need for extra setter methods), and does not interfere with encapsulation (the fields and even the constructor can be private). Furthermore, it is trivial for well-written classes to ensure that all objects are completely and properly instantiated: They simply need to check constructor parameters as any other POJO would, except that Tang never passes `null` references into constructors, allowing their implementations to assume that all parameter values are non-null. + +Tang aims to provide end users with error messages as early as possible, and encourages developers to throw exceptions inside of constructors. This allows it to automatically provide additional information to end-users when things go wrong: + + Exception in thread "main" org.apache.reef.tang.exceptions.InjectionException: Could not invoke constructor: new Timer(Integer Seconds = -1) + at org.apache.reef.tang.implementation.java.InjectorImpl.injectFromPlan(InjectorImpl.java:585) + at org.apache.reef.tang.implementation.java.InjectorImpl.getInstance(InjectorImpl.java:449) + at org.apache.reef.tang.implementation.java.InjectorImpl.getInstance(InjectorImpl.java:466) + at org.apache.reef.tang.examples.Timer.main(Timer.java:48) + Caused by: java.lang.IllegalArgumentException: Cannot sleep for negative time! + at org.apache.reef.tang.examples.Timer.<init>(Timer.java:25) + at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) + at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) + at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) + at java.lang.reflect.Constructor.newInstance(Unknown Source) + at org.apache.reef.tang.implementation.java.InjectorImpl.injectFromPlan(InjectorImpl.java:569) + ... 3 more + +In order for Tang to instantiate an object, we need to annotate the constructor with an `@Inject` annotation. While we're at it, we'll define a configuration parameter, allowing us to specify seconds on the command line and in a config file: + + package com.example; + + import javax.inject.Inject; + + import org.apache.reef.tang.annotations.Name; + import org.apache.reef.tang.annotations.NamedParameter; + import org.apache.reef.tang.annotations.Parameter; + + public class Timer { + @NamedParameter(default_value="10", + doc="Number of seconds to sleep", short_name="sec") + class Seconds implements Name<Integer> {} + private final int seconds; + + @Inject + public Timer(@Parameter(Seconds.class) int seconds) { + if(seconds < 0) { + throw new IllegalArgumentException("Cannot sleep for negative time!"); + } + this.seconds = seconds; + } + + public void sleep() throws Exception { + java.lang.Thread.sleep(seconds * 1000); + } + } + +A few things happened here. First, we create the new configuration parameter by declaring a dummy class that implements Tang's `Name` interface. `Name` is a generic type with a single mandatory parameter that specifies the type of object to be passed in. Since `Seconds` implements `Name<Integer>`, it is a parameter called `Seconds` that expects `Integer` values. More precisely, `Seconds` is actually named `com.example.Timer.Seconds`. This reliance on language types to define parameter names exposes parameters to the compiler and IDE. Concretely: + + * `javac` maps from `Seconds` to the full class name in the usual way, preventing parameters with the same name, but in different packages from conflicting. + * The Java classloader ensures that classes are unique at runtime. + * Standard IDE features, such as code navigation, completion and refactoring work as they normally would for class names. + + +All instances of `Name` must be annotated with `@NamedParameter`, which takes the following optional parameters: + + * `default_value`: The default value of the constructor parameter, encoded as a string. Tang will parse this value (and ones in config files and on the command line), and pass it into the constructor. For convenience Tang includes a number of helper variants of default value. `default_class` takes a Class (instead of a String), while `default_values` and `default_classes` take sets of values. + * `short_name`: The name of the command line option associated with this parameter. If omitted, no command line option will be created. Short names must be registered by calling `registerShortName()` on the instance of `org.apache.reef.tang.formats.CommandLine` that will process the command line options. + * `doc` (optional): Human readable documentation that describes the purpose of the parameter. + +Tang only invokes constructors that have been annotated with `@Inject`. This allows injectable constructors to coexist with ones that should not be invoked via dependency injection (such as ones with destructive side effects, or that expect `null` references). Constructor parameters must not be ambiguous. If two parameters in the same constructor have the same type, then they must be annotated with `@Parameter`, which associates a named parameter with the argument. Furthermore, two parameters to the same constructor cannot have the same name. This allows Tang to safely invoke constructors without exposing low level details (such as parameter ordering) as configuration options. + +<a name="configuration-modules"></a>Configuration modules +--------- + +Configuration modules allow applications to perform most configuration generation and verification tasks at build time. This allows Tang to automatically generate rich configuration-related documentation, to detect problematic design patterns, and to report errors before the application even begins to run. + +In the example below, we extend the Timer API to include a second implementation that simply outputs the amount of +time a real timer would have slept to stderr. In a real unit testing example, it would likely interact with a scheduler based on logical time. Of course, in isolation, having the ability to specify configuration parameters is not particularly useful; this example also adds a `main()` method that invokes Tang, and instantiates an object. + +The process of instantiting an object with Tang is called _injection_. As with configurations, Tang's injection process is designed to catch as many potential runtime errors as possible before application code begins to run. This simplifies debugging and eliminates many types of runtime error handling code, since many configurations can be caught before running (or examining) application-specific initialization code. + + package org.apache.reef.tang.examples.timer; + + import javax.inject.Inject; + + import org.apache.reef.tang.Configuration; + import org.apache.reef.tang.Tang; + + import org.apache.reef.tang.annotations.DefaultImplementation; + import org.apache.reef.tang.annotations.Name; + import org.apache.reef.tang.annotations.NamedParameter; + import org.apache.reef.tang.annotations.Parameter; + + import org.apache.reef.tang.exceptions.BindException; + import org.apache.reef.tang.exceptions.InjectionException; + + import org.apache.reef.tang.formats.ConfigurationModule; + import org.apache.reef.tang.formats.ConfigurationModuleBuilder; + import org.apache.reef.tang.formats.OptionalParameter; + + @DefaultImplementation(TimerImpl.class) + public interface Timer { + @NamedParameter(default_value="10", + doc="Number of seconds to sleep", short_name="sec") + public static class Seconds implements Name<Integer> { } + public void sleep() throws Exception; + } + + public class TimerImpl implements Timer { + + private final int seconds; + @Inject + public TimerImpl(@Parameter(Timer.Seconds.class) int seconds) { + if(seconds < 0) { + throw new IllegalArgumentException("Cannot sleep for negative time!"); + } + this.seconds = seconds; + } + @Override + public void sleep() throws Exception { + java.lang.Thread.sleep(seconds); + } + + } + + public class TimerMock implements Timer { + + public static class TimerMockConf extends ConfigurationModuleBuilder { + public static final OptionalParameter<Integer> MOCK_SLEEP_TIME = new OptionalParameter<>(); + } + public static final ConfigurationModule CONF = new TimerMockConf() + .bindImplementation(Timer.class, TimerMock.class) + .bindNamedParameter(Timer.Seconds.class, TimerMockConf.MOCK_SLEEP_TIME) + .build(); + + private final int seconds; + + @Inject + TimerMock(@Parameter(Timer.Seconds.class) int seconds) { + if(seconds < 0) { + throw new IllegalArgumentException("Cannot sleep for negative time!"); + } + this.seconds = seconds; + } + @Override + public void sleep() { + System.out.println("Would have slept for " + seconds + "sec."); + } + + public static void main(String[] args) throws BindException, InjectionException, Exception { + Configuration c = TimerMock.CONF + .set(TimerMockConf.MOCK_SLEEP_TIME, 1) + .build(); + Timer t = Tang.Factory.getTang().newInjector(c).getInstance(Timer.class); + System.out.println("Tick..."); + t.sleep(); + System.out.println("...tock."); + } + } + +Again, there are a few things going on here: + + - First, we push the implementation of `Timer` into a new class, `TimerImpl`. The `@DefaultImplementation` tells Tang to use `TimerImpl` when no other implementation is explicitly provided. + - We leave the `Sleep` class in the Timer interface. This, plus the `@DefaultImplementation` annotation maintain backward compatibility with code that used Tang to inject the old `Timer` class. + - The `TimerMock` class includes a dummy implementation of Timer, along with a `ConfigurationModule` final static field called `CONF`. + - The main method uses `CONF` to generate a configuration. Rather than set `Timer.Sleep` directly, it sets `MOCK_SLEEP_TIME`. In a more complicated example, this would allow `CONF` to route the sleep time to testing infrastructure, or other classes that are specific to the testing environment or implemenation of `TimerMock`. + +`ConfigurationModule`s serve a number of purposes: + + - They allow application and library developers to encapsulate the details surrounding their code's instantiation. + - They provide Java APIs that expose `OptionalParameter`, `RequiredParameter`, `OptionalImplementation`, `RequiredImpementation` fields. These fields tell users of the ConfigurationModule which subsystems of the application require which configuration parameters, and allow the author of the ConfigurationModule to use JavaDoc to document the parameters they export. + - Finally, because ConfigurationModule data structures are populated at class load time (before the application begins to run), they can be inspected by Tang's static analysis tools. + +These tools are provided by `org.apache.reef.tang.util.Tint`, which is included by default in all Tang builds. As long as Tang is on the classpath, invoking: + + java org.apache.reef.tang.util.Tint --doc tangdoc.html + +will perform full static analysis of all classes the class path, and emit a nicely formatted HTML document. The documentation generated by Tint includes cross-references between configuration options, interfaces, classes, and the `ConfigurationModules` that use and set them. + +Here are some sample Tint errors. These (and others) can be run by passing `--tang-tests` into Tint, and ensuring that Tang's unit tests are on the class path.: + + interface org.apache.reef.tang.MyEventHandlerIface declares its default implementation to be non-subclass class org.apache.reef.tang.MyEventHandler + class org.apache.reef.tang.WaterBottleName defines a default class org.apache.reef.tang.GasCan with a type that does not extend its target's type org.apache.reef.tang.Bottle<org.apache.reef.tang.Water> + Named parameters org.apache.reef.tang.examples.Timer$Seconds and org.apache.reef.tang.examples.TimerV1$Seconds have the same short name: sec + Named parameter org.apache.reef.tang.implementation.AnnotatedNameMultipleInterfaces implements multiple interfaces. It is only allowed to implement Name<T> + Found illegal @NamedParameter org.apache.reef.tang.implementation.AnnotatedNotName does not implement Name<?> + interface org.apache.reef.tang.implementation.BadIfaceDefault declares its default implementation to be non-subclass class java.lang.String + class org.apache.reef.tang.implementation.BadName defines a default class java.lang.Integer with a raw type that does not extend of its target's raw type class java.lang.String + Named parameter org.apache.reef.tang.implementation.BadParsableDefaultClass defines default implementation for parsable type java.lang.String + Class org.apache.reef.tang.implementation.DanglingUnit has an @Unit annotation, but no non-static inner classes. Such @Unit annotations would have no effect, and are therefore disallowed. + Cannot @Inject non-static member class unless the enclosing class an @Unit. Nested class is:org.apache.reef.tang.implementation.InjectNonStaticLocalType$NonStaticLocal + Named parameter org.apache.reef.tang.implementation.NameWithConstructor has a constructor. Named parameters must not declare any constructors. + Named parameter type mismatch. Constructor expects a java.lang.String but Foo is a java.lang.Integer + public org.apache.reef.tang.implementation.NonInjectableParam(int) is not injectable, but it has an @Parameter annotation. + Detected explicit constructor in class enclosed in @Unit org.apache.reef.tang.implementation.OuterUnitBad$InA Such constructors are disallowed. + Repeated constructor parameter detected. Cannot inject constructor org.apache.reef.tang.implementation.RepeatConstructorArg(int,int) + Named parameters org.apache.reef.tang.implementation.ShortNameFooA and org.apache.reef.tang.implementation.ShortNameFooB have the same short name: foo + Named parameter org.apache.reef.tang.implementation.UnannotatedName is missing its @NamedParameter annotation. + Field org.apache.reef.tang.formats.MyMissingBindConfigurationModule.BAD_CONF: Found declared options that were not used in binds: { FOO_NESS } + +<a name="injnecting-objects-with-getInstance"></a>Injecting objects with getInstance() +-------------------------------------- + +Above, we explain how to register constructors with Tang, and how to configure Tang to inject the desired objects at runtime. This section explains how Tang actually instantiates objects, and how the primitives it provides can be combined to support sophisticated application architectures. + +In order to instantiate objects with Tang, one must invoke Tang.Factory.getTang().newInjector(Configuration...). This returns a new "empty" injector that will honor the configuration options that were set in the provided configurations, and that will have access to a merged version of the classpath they refer to. + +In a given Tang injector, all classes are treated as singletons: at most one instance of each class may exist. Furthermore, Tang Configuration objects are designed to be built up from trees of related (but non-conflicting) configuration files, command line parameters, and so on. At first, this may seem to be overly restrictive, since it prevents applications from creating multiple instances of the same class (or even two classes that require different values of the same named parameter). + +Tang addresses this by providing the runtime environment more explicit control over object and configuration parameter scopes. Taken together, `forkInjector()`, `bindVolatile()` and `InjectionFuture<T>` allow Tang to inject arbitrary sets of objects (including ones with multiple instances of the same class). + +Other injection frameworks take a different approach, and allow class implementations to decide if they should be singletons across a given JVM (e.g., with an `@Singleton` annotation), user session (for web services), user connection, and so on. This approach has at least two problems: + + * It is not general purpose: after all, it encodes deployment scenarios into the injection framework and application APIs! + * It trades one barrier to composability and reuse: _hard-coded constructor invocations_ with another: _hard-coded runtime environments_. The former prevents runtime environments from adapting to application-level changes, while the latter prevents application code from adapting to new runtimes. + +Tang's approach avoids both issues by giving the implementation of the runtime environment explicit control over object scopes and lifetimes. + +`forkInjector()` makes a copy of a given injector, including references to all the objects already instantiated by the original injector. This allows runtime environments to implement scopes. First, a root injector is created. In order to create a child scope, the runtime simply invokes `forkInjector()` on the root context, and optionally passes additional `Configuration` objects in. These additional configurations allow the runtime to specialize the root context. + +Although the forked injector will have access to any objects and configuration bindings that existed when `forkInjector()` was called, neither the original nor the forked injectors will reflect future changes to the other injector. + +The second primitive, `bindVolatile()`, provides an injector with an instance of a class or named parameter. The injector treats this instance as though it had injected the object directly. This: + + * allows passing of information between child scopes + * makes it possible to create (for example) chains of objects of the same type + * and allows objects that cannot be instantiated via Tang to be passed into injectable constructors. + +###<a name="cyclic-injections"></a>Cyclic Injections + +Although the above primitives allow applications to inject arbitrary DAGs (directed acyclic graphs) of objects, they do not support cycles of objects. Tang provides the `InjectionFuture<T>` interfaces to support such _cyclic injections_. + +When Tang encounters a constructor parameter of type `InjectionFuture<T>`, it injects an object that provides a method `T get()` that returns an injected instance of `T`. + + +This can be used to break cycles: + + A(B b) {...} + B(InjectionFuture<A> a) {...} + +In order to inject an instance of `A`, Tang first injects an instance of `B` by passing it an `InjectionFuture<A>`. Tang then invoke's `A`'s constructor, passing in the instance of `B`. Once the constructor returns, the new instance of `A` is passed into `B`'s `InjectionFuture<A>`. At this point, it becomes safe for `B` to invoke `get()`, which establishes the circular reference. + +Therefore, along with `forkInjector()` and `bindVolatile()`, this allows Tang to inject arbitrary graphs of objects. This pattern avoids non-final fields (once set, all fields of all objects are constant), and it also avoids boiler plate error handling code that checks to see if `B`'s instance of `A` has been set. + + +When `get()` is called after the application-level call to `getInstance()` returns, it is guranteed to return a non-null reference to an injected instance of the object. However, if `get()` is called _before_ the constructor it was passed to returns, then it is guaranteed to throw an exception. In between these two points in time, `get()`'s behavior is undefined, but, for the sake of race-detection and forward compatibility it makes a best-effort attempt to throw an exception. + +Following Tang's singleton semantics, the instance returned by `get()` will be the same instance the injector would pass into other constructors or return from `getInstance()`. + +<a name="alternative-configuration-sources"></a>Alternative configuration sources +================================= + +Tang provides a number of so-called _formats_ that interface with external configuration data. `ConfigurationModule` is one such example (see above). These formats transform configuration data to and from Tang's raw configuration API. The raw API provides an implementation of ConfigurationBuilder, which implements most of Tang's configuration checks. It also provides a `JavaConfigurationBuilder` interface provides convenience methods that take Java Classes, and leverage Java's generic type system to push a range of static type checks to Java compilation time. + +<a name="raw-configuration-api"></a>Raw configuration API +--------- +Tang also provides a lower level configurtion API for applications that need more dynamic control over their configurations: + + ... + import org.apache.reef.tang.Tang; + import org.apache.reef.tang.ConfigurationBuilder; + import org.apache.reef.tang.Configuration; + import org.apache.reef.tang.Injector; + import org.apache.reef.tang.exceptions.BindException; + import org.apache.reef.tang.exceptions.InjectionException; + + ... + public static void main(String[] args) throws BindException, InjectionException { + Tang tang = Tang.Factory.getTang(); + ConfigurationBuilder cb = (ConfigurationBuilder)tang.newConfigurationBuilder(); + cb.bindNamedParameter(Timer.Seconds.class, 5); + Configuration conf = cb.build(); + Injector injector = tang.newInjector(conf); + if(!injector.isInjectable(Timer.class)) { + System.err.println("If isInjectable returns false, the next line will throw an exception"); + } + Timer timer = injector.getInstance(Timer.class); + + try { + System.out.println("Tick..."); + timer.sleep(); + System.out.println("Tock."); + } catch(InterruptedException e) { + e.printStackTrace(); + } + } + +The first step in using Tang is to get a handle to a Tang object by calling "Tang.Factory.getTang()". Having obtained a handle, we run through each of the phases of a Tang injection: + + * We use `ConfigurationBuilder` objects to tell Tang about the class hierarchy that it will be using to inject objects and (in later examples) to register the contents of configuration files, override default configuration values, and to set default implementations of classes. `ConfigurationBuilder` and `ConfigurationModuleBuider` export similar API's. The difference is that `ConfigurationBuilder` produces `Configuration` objects directly, and is designed to be used at runtime. `ConfigurationModuleBuilder` is desgined to produce data structures that will be generated and analyzed during the build, and at class load time. + * `bindNamedParameter()` overrides the default value of Timer.Sleep, setting it to 5. Tang inteprets the 5 as a string, but allows instances of Number to be passed in as syntactic sugar. + * We call `.build()` on the `ConfigurationBuilder`, creating an immutable `Configuration` object. At this point, Tang ensures that all of the classes it has encountered so far are consistent with each other, and that they are suitable for injection. When Tang encounters conflicting classes or configuration files, it throws a `BindException` to indicate that the problem is due to configuration issues. Note that `ConfigurationBuilder` and `Configuration` do not determine whether or not a particular injection will succeed; that is the business of the _Injector_. + * To obtain an instance of Injector, we pass our Configuration object into `tang.newInjector()`. + * `injector.isInjectable(Timer.class)` checks to see if Timer is injectable without actually performing an injection or running application code. (Note that, in this example, the Java classloader may have run application code. For more information, see the advanced tutorials on cross-language injections and securely building configurations for untrusted code.) + * Finally, we call `injector.getInstance(Timer.class)`. Internally, this method considers all possible injection plans for `Timer`. If there is exactly one such plan, it performs the injection. Otherwise, it throws an `InjectionException`. + +Tang configuration information can be divided into two categories. The first type, _parameters_, pass values such as strings and integers into constructors. Users of Tang encode configuration parameters as strings, allowing them to be stored in configuration files, and passed in on the command line. + +The second type of configuration option, _implementation bindings_, are used to tell Tang which implementation should be used when an instance of an interface is requested. Like configuration parameters, implementation bindings are expressible as strings: Tang configuration files simply contain the raw (without the generic parameters) name of the Java Classes to be bound together. + +New parameters are created and passed into constructors as in the examples above, by creating implementations of `Name<T>`, and adding `@NamedParameter`, `@Parameter` and `@Inject` annotations as necessary. Specifying implementations for interfaces is a bit more involved, as a number of subtle use cases arise. + +However, all configuration settings in Tang can be unambiguously represented as a `key=value` pair that can be interpreted either asan `interface=implementation` pair or a `configuration_parameter=value` pair. This maps well to Java-style properties files. For example: + + com.examples.Interface=com.examples.Implementation + + +tells Tang to create a new Implementation each time it wants to invoke a constructor that asks for an instance of Interface. In most circumstances, Implementation extends or implements Interface (`ExternalConstructors` are the exception -- see the next section). In such cases, Tang makes sure that Implementation contains at least one constructor with an `@Inject` annotation, and performs the binding. + +See the `ConfigurationFile` API for more information about processing configuration files in this format. + + + +<a name="looking-under-the-hood"></a>Looking under the hood +---------------------- + +###<a name="injectionPlan"></a>InjectionPlan + +InjectionPlan objects explain what Tang would do to instantiate a new object, but don't actually instantiate anything. +Add the following lines to the Timer example; + + import org.apache.reef.tang.implementation.InjectionPlan; + import org.apache.reef.tang.implementation.InjectorImpl; + ... + InjectorImpl injector = (InjectorImpl)tang.newInjector(conf); + InjectionPlan<Timer> ip = injector.getInjectionPlan(Timer.class); + System.out.println(ip.toPrettyString()); + System.out.println("Number of plans:" + ip.getNumAlternatives()); + + +Running the program now produces a bit of additional output: + + new Timer(Integer Seconds = 10) + Number of plans:1 + + +InjectionPlan objects can be serialized to protocol buffers. The following file documents their format: + +[https://github.com/apache/incubator-reef/blob/master/reef-tang/tang/src/main/proto/injection_plan.proto](https://github.com/apache/incubator-reef/blob/master/reef-tang/tang/src/main/proto/injection_plan.proto) + +###<a name="classHierarchy"></a>ClassHierarchy + +InjectionPlan explains what would happen if you asked Tang to take some action, but it doesn't provide much insight into Tang's view of the object hierarchy, parameter defaults and so on. ClassHierarchy objects encode the state that Tang gets from .class files, including class inheritance relationships, parameter annotations, and so on. + +Internally, in the example above, TypeHierarchy walks the class definition for Timer, looking for superclasses, interfaces, and classes referenced by its constructors. + +ClassHierarchy objects can be serialized to protocol buffers. The following file documents their format: + +[https://github.com/apache/incubator-reef/blob/master/reef-tang/tang/src/main/proto/class_hierarchy.proto](https://github.com/apache/incubator-reef/blob/master/reef-tang/tang/src/main/proto/class_hierarchy.proto) + +The java interfaces are available in this package: + +[https://github.com/apache/incubator-reef/tree/master/reef-tang/tang/src/main/java/org/apache/reef/tang/types](https://github.com/apache/incubator-reef/tree/master/reef-tang/tang/src/main/java/org/apache/reef/tang/types) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/markdown/tutorial.md ---------------------------------------------------------------------- diff --git a/website/src/site/markdown/tutorial.md b/website/src/site/markdown/tutorial.md new file mode 100644 index 0000000..a81729e --- /dev/null +++ b/website/src/site/markdown/tutorial.md @@ -0,0 +1,137 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +#REEF Tutorial + +- [Installing and Compiling REEF](#install) +- [Running HelloREEF](#running-reef) + - [Local](#local) + - [HelloREEFNoClient](#helloREEFNoClient) + - [YARN](reef-examples.html#yarn) +- [Further Examples](#further-examples) + + +###<a name="install"></a>Installing and Compiling REEF + + +####Requirements + +- [Java](http://www.oracle.com/technetwork/java/index.html) 7 Development Kit +- [Maven 3](http://maven.apache.org/) or newer. Make sure that `mvn` is in your `PATH`. +- [Protocol Buffers Compiler (protoc) 2.5](http://code.google.com/p/protobuf/) Make sure that protoc is in your PATH. **Note**: You need to install version 2.5. Newer versions won't work. + +With these requirements met, the instructions below should work regardless of OS choice and command line interpreter. On Windows, you might find [this](http://cs.markusweimer.com/2013/08/02/how-to-setup-powershell-for-github-maven-and-java-development/) tutorial helpful in setting up PowerShell with Maven, GitHub and Java. You will still have to install the [Protocol Buffers Compiler](https://code.google.com/p/protobuf/), though. + +####Cloning the repository +#####Comitters + $ git clone https://git-wip-us.apache.org/repos/asf/incubator-reef.git + +#####Users + $ git clone git://git.apache.org/incubator-reef.git + +####Compiling the code +REEF is built using Maven. Hence, a simple + + $ mvn clean install + +should suffice. Note that we have quite a few integration tests in the default build. Hence, you might be better off using + + $ mvn -TC1 -DskipTests clean install + +This runs one thread per core (`-TC1`) and skips the tests (`-DskipTests`) + +**Note**: You will see many exception printouts during the compilation of REEF with tests. Those are not, in fact, problems with the build: REEF guarantees that exceptions thrown on remote machines get serialized and shipped back to the Driver. We have extensive unit tests for that feature that produce the confusing printouts. + +### <a name="running-reef"></a>Running HelloREEF + +####Prerequisites + +[You have compiled REEF locally](#install). + +####Running your first REEF program: Hello, REEF! + +The module REEF Examples in the folder `reef-examples` contains several simple programs built on REEF to help you get started with development. As always, the simplest of those is our "Hello World": Hello REEF. Upon launch, it grabs a single Evaluator and submits a single Task to it. That Actvity, fittingly, prints 'Hello REEF!' to stdout. To launch it, navigate to `REEF_HOME` and use the following command: + + java -cp reef-examples/target/reef-examples-{$REEF_VERSION}-incubating-shaded.jar org.apache.reef.examples.hello.HelloREEF + +**Note**: *The path divider may be different for different OS (e.g. Windows uses \\ while Linux uses / for dividers) and the version number of your version of REEF must be placed instead of the \* so change the code as needed.* + +This invokes the shaded jar within the target directory and launches HelloREEF on the local runtime of REEF. During the run, you will see something similar to this output: + + Powered by + ___________ ______ ______ _______ + / ______ / / ___/ / ___/ / ____/ + / _____/ / /__ / /__ / /___ + / /\ \ / ___/ / ___/ / ____/ + / / \ \ / /__ / /__ / / + /__/ \__\ /_____/ /_____/ /__/ + + ... + INFO: REEF Version: 0.10.0-incubating + ... + INFO: The Job HelloREEF is running. + ... + INFO: The Job HelloREEF is done. + ... + INFO: REEF job completed: COMPLETED + +####Where's the output? + +The local runtime simulates a cluster of machines: It executes each Evaluator in a separate process on your machine. Hence, the Evaluator that printed "Hello, REEF" is not executed in the same process as the program you launched above. So, how do you get to the output of the Evaluator? The local runtime generates one folder per job it executes in `REEF_LOCAL_RUNTIME`: + + > cd REEF_LOCAL_RUNTIME + > ls HelloREEF* + Mode LastWriteTime Length Name + ---- ------------- ------ ---- + d---- 1/26/2015 11:21 AM HelloREEF-1422238904731 + +The job folder names are comprised of the job's name (here, `HelloREEF`) and the time stamp of its submission (here, `1422238904731`). If you submit the same job multiple times, you will get multiple folders here. Let's move on: + + > cd HelloREEF-1422238904731 + > ls + Mode LastWriteTime Length Name + ---- ------------- ------ ---- + d---- 1/26/2015 11:21 AM driver + d---- 1/26/2015 11:22 AM Node-1-1422238907421 + +Inside of the job's folder, you will find one folder for the job's Driver (named `driver`) and one per Evaluator. Their name comprises of the virtual node simulated by the local runtime (here, `Node-1`) followed by the time stamp of when this Evaluator was allocated on that node, here `1422238907421`. As the HelloREEF example program only allocated one Evaluator, we only see one of these folders here. Let's peek inside: + + > cd Node-1-1422238907421 + > ls evaluator* + Mode LastWriteTime Length Name + ---- ------------- ------ ---- + -a--- 1/26/2015 11:21 AM 1303 evaluator.stderr + -a--- 1/26/2015 11:21 AM 14 evaluator.stdout + +`evaluator.stderr` contains the output on stderr of this Evaluator, which mostly consists of logs helpful in debugging. `evaluator.stdout` contains the output on stdout. And, sure enough, this is where you find the "Hello, REEF!" message. + +####<a name="helloREEFNoClient"></a>The difference between HelloREEF and HelloREEFNoClient + +The HelloREEF application has multiple versions that all service different needs and one of these applications, `HelloREEFNoClient`, allows the creation of the Driver and Evaluators without the creation of a Client. In many scenarios involving a cluster of machines, one Client will access multiple Drivers so not every Driver needs to create a Client and that is where the `HelloREEFNoClient` application shines. + +Running `HelloREEFNoClient` is nearly identical to running `HelloREEF`: + + java -cp reef-examples/target/reef-examples-{$REEF_VERSION}-incubating-shaded.jar org.apache.reef.examples.hello.HelloREEFNoClient + +**Note**: *The path divider may be different for different OS (e.g. Windows uses \\ while Linux uses / for dividers) and the version number of your version of REEF must be placed instead of the \* so change the code as needed.* + +and the output should be the same with `evaluator.stdout` containing the "Hello, REEF!" message. + +###<a name="further-examples"></a>Further Examples + +Further examples of using REEF can be found [here](reef-examples.html). http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/markdown/wake.md ---------------------------------------------------------------------- diff --git a/website/src/site/markdown/wake.md b/website/src/site/markdown/wake.md new file mode 100644 index 0000000..8ae279d --- /dev/null +++ b/website/src/site/markdown/wake.md @@ -0,0 +1,109 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +Wake +==== + +Wake is an event-driven framework based on ideas from SEDA, Click, Akka and Rx. It is *general purpose* in the sense that it is designed to support computationally intensive applications as well as high performance networking, storage, and legacy I/O systems. We implemented Wake to support high-performance, scalable analytical processing systems ("big data" applications), and have used it to implement control plane logic (which requires high fanout and low latency) and the data plane (which requires high-throughput processing as well). + + +Background +---------- +Wake applications consist of asynchronous *event handlers* that run inside of *stages*. Stages provide scheduling primitives such as thread pool sizing and performance isolation between event handlers. In addition to event handler and stage APIs, Wake includes profiling tools and a rich standard library of primitives for system builders. + +Event driven processing frameworks improve upon the performance of threaded architectures in two ways: (1) Event handlers often have lower memory and context switching overhead than threaded solutions, and (2) event driven systems allow applications to allocate and monitor computational and I/O resources in an extremely fine-grained fashion. Modern threading packages have done much to address the first concern, and have significantly lowered concurrency control and other implementation overheads in recent years. However, fine grained resource allocation remains a challenge in threaded systems, and is Wake's primary advantage over threading. + +Early event driven systems such as SEDA executed each event handler in a dedicated thread pool called a stage. This isolated low-latency event handlers (such as cache lookups) from expensive high-latency operations, such as disk I/O. With a single thread pool, high-latency I/O operations can easily monopolize the thread pool, causing all of the CPUs to block on disk I/O, even when there is computation to be scheduled. With separate thread pools, the operating system schedules I/O requests and computation separately, guaranteeing that runnable computations will not block on I/O requests. + +This is in contrast to event-driven systems such as the Click modular router that were designed to maximize throughput for predictable, low latency event-handlers. When possible, Click aggressively chains event handlers together, reducing the cost of an event dispatch to that of a function call, and allowing the compiler to perform optimizations such as inlining and constant propagation across event handlers. + +Wake allows developers to trade off between these two extremes by explicitly partitioning their event handlers into stages. Within a stage, event handlers engage in *thread-sharing* by simply calling each other directly. When an event crosses a stage boundary, it is placed in a queue of similar events. The queue is then drained by the threads managed by the receiving stage. + +Although event handling systems improve upon threaded performance in theory, they are notoriously difficult to reason about. We kept this in mind while designing Wake, and have gone to great pains to ensure that its APIs are simple and easy to implement without sacrificing our performance goals. + +Other event driven systems provide support for so-called *push-based* and *pull-based* event handlers. In push-based systems, event sources invoke event handlers that are exposed by the events' destinations, while pull-based APIs have the destination code invoke iterators to obtain the next available event from the source. + +Wake is completely push based. This eliminates the need for push and pull based variants of event handling logic, and also allowed us to unify all error handling in Wake into a single API. It is always possible to convert between push and pull based APIs by inserting a queue and a thread boundary between the push and pull based code. Wake supports libraries and applications that use this trick, since operating systems and legacy code sometimes expose pull-based APIs. + +Systems such as Rx allow event handlers to be dynamically registered and torn down at runtime, allowing applications to evolve over time. This leads to complicated setup and teardown protocols, where event handlers need to reason about the state of upstream and downstream handlers, both during setup and teardown, but also when routing messages at runtime. It also encourages design patterns such as dynamic event dispatching that break standard compiler optimizations. In contrast, Wake applications consist of immutable graphs of event handlers that are built up from sink to source. This ensures that, once an event handler has been instantiated, all downstream handlers are ready to receive messages. + +Wake is designed to work with [Tang](tang.html), a dependency injection system that focuses on configuration and debuggability. This makes it extremely easy to wire up complicated graphs of event handling logic. In addition to making it easy to build up event-driven applications, Tang provides a range of static analysis tools and provides a simple aspect-style programming facility that supports Wake's latency and throughput profilers. + + +Core API +-------- + +### Event Handlers + +Wake provides two APIs for event handler implementations. The first is the [EventHandler](https://github.com/apache/incubator-reef/blob/master/reef-wake/wake/src/main/java/org/apache/reef/wake/EventHandler.java) interface: + + public interface EventHandler<T> { + void onNext(T value); + } + +Callers of `onNext()` should assume that it is asynchronous, and that it always succeeds. Unrecoverable errors should be reported by throwing a runtime exception (which should not be caught, and will instead take down the process). Recoverable errors are reported by invoking an event handler that contains the appropriate error handling logic. + +The latter approach can be implemented by registering separate event handlers for each type of error. However, for convenience, it is formalized in Wake's simplified version of the Rx [Observer](https://github.com/apache/incubator-reef/blob/master/reef-wake/wake/src/main/java/org/apache/reef/wake/rx/Observer.java) interface: + + public interface Observer<T> { + void onNext(final T value); + void onError(final Exception error); + void onCompleted(); + } + +The `Observer` is designed for stateful event handlers that need to be explicitly torn down at exit, or when errors occor. Such event handlers may maintain open network sockets, write to disk, buffer output, and so on. As with `onNext()`, neither `onError()` nor `onCompleted()` throw exceptions. Instead, callers should assume that they are asynchronously invoked. + +`EventHandler` and `Observer` implementations should be threadsafe and handle concurrent invocations of `onNext()`. However, it is illegal to call `onCompleted()` or `onError()` in race with any calls to `onNext()`, and the call to `onCompleted()` or `onError()` must be the last call made to the object. Therefore, implementations of `onCompleted()` and `onError()` can assume they have a lock on `this`, and that `this` has not been torn down and is still in a valid state. + +We chose these invariants because they are simple and easy to enforce. In most cases, application logic simply limits calls to `onCompleted()` and `onError()` to other implementations of `onError()` and `onCompleted()`, and relies upon Wake (and any intervening application logic) to obey the same protocol. + +### Stages + +Wake Stages are responsible for resource management. The base [Stage](https://github.com/apache/incubator-reef/blob/master/reef-wake/wake/src/main/java/org/apache/reef/wake/Stage.java) interface is fairly simple: + + public interface Stage extends AutoCloseable { } + +The only method it contains is `close()` from auto-closable. This reflects the fact that Wake stages can either contain `EventHandler`s, as [EStage](https://github.com/apache/incubator-reef/blob/master/reef-wake/wake/src/main/java/org/apache/reef/wake/EStage.java) implementations do: + + public interface EStage<T> extends EventHandler<T>, Stage { } + +or they can contain `Observable`s, as [RxStage](https://github.com/apache/incubator-reef/blob/master/reef-wake/wake/src/main/java/org/apache/reef/wake/rx/RxStage.java) implementations do: + + public interface RxStage<T> extends Observer<T>, Stage { } + +In both cases, the stage simply exposes the same API as the event handler that it manages. This allows code that produces events to treat downstream stages and raw `EventHandlers` / `Observers` interchangebly. Recall that Wake implements thread sharing by allowing EventHandlers and Observers to directly invoke each other. Since Stages implement the same interface as raw EventHandlers and Observers, this pushes the placement of thread boundaries and other scheduling tradeoffs to the code that is instantiating the application. In turn, this simplifies testing and improves the reusability of code written on top of Wake. + +#### close() vs. onCompleted() + +It may seem strange that Wake RxStage exposes two shutdown methods: `close()` and `onCompleted()`. Since `onCompleted()` is part of the Observer API, it may be implemented in an asynchronous fashion. This makes it difficult for applications to cleanly shut down, since, even after `onCompleted()` has returned, resources may still be held by the downstream code. + +In contrast, `close()` is synchronous, and is not allowed to return until all queued events have been processed, and any resources held by the Stage implementation have been released. The upshot is that shutdown sequences in Wake work as follows: Once the upstream event sources are done calling `onNext()` (and all calls to `onNext()` have returned), `onCompleted()` or `onError()` is called exactly once per stage. After the `onCompleted()` or `onError()` call to a given stage has returned, `close()` must be called. Once `close()` returns, all resources have been released, and the JVM may safely exit, or the code that is invoking Wake may proceed under the assumption that no resources or memory have been leaked. Note that, depending on the implementation of the downstream Stage, there may be a delay between the return of calls such as `onNext()` or `onCompleted()` and their execution. Therefore, it is possible that the stage will continue to schedule `onNext()` calls after `clos e()` has been invoked. It is illegal for stages to drop events on shutdown, so the stage will execute the requests in its queue before it releases resources and returns from `close()`. + +`Observer` implementations do not expose a `close()` method, and generally do not invoke `close()`. Instead, when `onCompleted()` is invoked, it should arrange for `onCompleted()` to be called on any `Observer` instances that `this` directly invokes, free any resources it is holding, and then return. Since the downstream `onCompleted()` calls are potentially asynchronous, it cannot assume that downstream cleanup completes before it returns. + +In a thread pool `Stage`, the final `close()` call will block until there are no more outstanding events queued in the stage. Once `close()` has been called (and returns) on each stage, no events are left in any queues, and no `Observer` or `EventHandler` objects are holding resources or scheduled on any cores, so shutdown is compelete. + +Helper libraries +---------------- + +Wake includes a number of standard library packages: + + - `org.apache.reef.wake.time` allows events to be scheduled in the future, and notifies the application when it starts and when it is being torn down. + - `org.apache.reef.wake.remote` provides networking primitives, including hooks into netty (a high-performance event-based networking library for Java). + - `org.apache.reef.wake.metrics` provides implementations of standard latency and throughput instrumentation. + - `org.apache.reef.wake.profiler` provides a graphical profiler that automatically instruments Tang-based Wake applications. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/resources/ApacheIncubator.png ---------------------------------------------------------------------- diff --git a/website/src/site/resources/ApacheIncubator.png b/website/src/site/resources/ApacheIncubator.png new file mode 100644 index 0000000..c04e70d Binary files /dev/null and b/website/src/site/resources/ApacheIncubator.png differ http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/resources/REEFDiagram.png ---------------------------------------------------------------------- diff --git a/website/src/site/resources/REEFDiagram.png b/website/src/site/resources/REEFDiagram.png new file mode 100644 index 0000000..d7625a9 Binary files /dev/null and b/website/src/site/resources/REEFDiagram.png differ http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/resources/REEFLogo.png ---------------------------------------------------------------------- diff --git a/website/src/site/resources/REEFLogo.png b/website/src/site/resources/REEFLogo.png new file mode 100644 index 0000000..865dd0c Binary files /dev/null and b/website/src/site/resources/REEFLogo.png differ http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/resources/reef-architecture.png ---------------------------------------------------------------------- diff --git a/website/src/site/resources/reef-architecture.png b/website/src/site/resources/reef-architecture.png new file mode 100644 index 0000000..321aaae Binary files /dev/null and b/website/src/site/resources/reef-architecture.png differ http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/resources/states-horizontal.png ---------------------------------------------------------------------- diff --git a/website/src/site/resources/states-horizontal.png b/website/src/site/resources/states-horizontal.png new file mode 100644 index 0000000..6627815 Binary files /dev/null and b/website/src/site/resources/states-horizontal.png differ http://git-wip-us.apache.org/repos/asf/incubator-reef/blob/fa77cc63/website/src/site/site.xml ---------------------------------------------------------------------- diff --git a/website/src/site/site.xml b/website/src/site/site.xml new file mode 100644 index 0000000..9e8d581 --- /dev/null +++ b/website/src/site/site.xml @@ -0,0 +1,122 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +<project name="Apache REEF" xmlns="http://maven.apache.org/DECORATION/1.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/DECORATION/1.0.0 http://maven.apache.org/xsd/decoration-1.0.0.xsd"> + <skin> + <groupId>org.apache.maven.skins</groupId> + <artifactId>maven-fluido-skin</artifactId> + <version>1.3.1</version> + + </skin> + <custom> + + + <fluidoSkin> + <breadcrumbDivider>|</breadcrumbDivider> + <topBarEnabled>true</topBarEnabled> + <!-- Should always add up to 12 --> + <leftColumnClass>span2</leftColumnClass> + <bodyColumnClass>span10</bodyColumnClass> + + <sideBarEnabled>true</sideBarEnabled> + <!-- Makes the top bar color = black + <navBarStyle>navbar-inverse</navBarStyle> + --> + + </fluidoSkin> + </custom> + + <bannerLeft> + <src>REEFLogo.png</src> + <href></href> + </bannerLeft> + <bannerRight> + <src>ApacheIncubator.png</src> + <href>http://incubator.apache.org/</href> + </bannerRight> + + <version position="none"/> + <publishDate position="none"/> + + <body> + <breadcrumbs> + <item name="Apache REEF" href="index.html"/> + </breadcrumbs> + <links> + <item name="Apache REEF GitHub" href="https://github.com/apache/incubator-reef"/> + <item name="Apache" href="http://www.apache.org"/> + <item name="Apache Incubator" href="http://incubator.apache.org/"/> + </links> + <menu name="Apache REEF"> + <item name="Overview" href="index.html"/> + <item name="FAQ" href="faq.html"/> + <item name="License" href="license.html"/> + <item name="Downloads" href="downloads.html"/> + <item name="0.10.0-incubating API" href="apidocs/0.10.0-incubating/index.html"/> + </menu> + + <menu name="Documentation"> + <item name="Introduction to REEF" href="introduction.html"/> + <item name="REEF Tutorial" href="tutorial.html"/> + <item name="Further REEF Examples" href="reef-examples.html"/> + <item name="Glossary" href="glossary.html"/> + <item name="Tang" href="tang.html"/> + <item name="Wake" href="wake.html"/> + </menu> + <menu name="Contribution"> + <item name="Contributing" href="contributing.html"/> + <item name="Committer Guide" href="committer-guide.html"/> + <item name="Coding Guidelines" href="coding-guideline.html"/> + </menu> + <menu name="Community"> + <item name="Team" href="team.html"/> + <item name="Mailing List" href="mailing-list.html"/> + <item name="Issue Tracker" href="issue-tracker.html"/> + <item name="Powered By" href="powered-by.html"/> + </menu> + <menu name="ASF"> + <item name="Apache Software Foundation" href="http://www.apache.org/foundation/"/> + <item name="How Apache Works" href="http://www.apache.org/foundation/how-it-works.html"/> + <item name="Apache Incubator" href="http://incubator.apache.org/"/> + <item name="Apache License" href="http://www.apache.org/licenses/LICENSE-2.0.html"/> + <item name="Sponsorship" href="http://www.apache.org/foundation/sponsorship.html"/> + <item name="Thanks" href="http://www.apache.org/foundation/thanks.html"/> + </menu> + <footer> + <div class="container-fluid"> + <div class="row-fluid"> + <a href="http://www.apache.org">Apache Software Foundation</a>. + All Rights Reserved. + </div> + </div> + + <div class="row span12"> + Apache REEF, REEF, Apache, the Apache feather logo, and the Apache REEF logo are trademarks + of The Apache Software Foundation. All other marks mentioned may be trademarks or registered + trademarks of their respective owners. + </div> + </footer> + <!-- + this command gives a bunch of information about the Maven site creation project + <menu ref="reports"/> + --> + </body> +</project> \ No newline at end of file
