Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/5045#discussion_r152608576 --- Diff: docs/concepts/programming-model.md --- @@ -33,53 +33,52 @@ Flink offers different levels of abstraction to develop streaming/batch applicat <img src="../fig/levels_of_abstraction.svg" alt="Programming levels of abstraction" class="offset" width="80%" /> - - The lowest level abstraction simply offers **stateful streaming**. It is embedded into the [DataStream API](../dev/datastream_api.html) - via the [Process Function](../dev/stream/operators/process_function.html). It allows users freely process events from one or more streams, - and use consistent fault tolerant *state*. In addition, users can register event time and processing time callbacks, + - The lowest level abstraction offers **stateful streaming** and is embedded into the [DataStream API](../dev/datastream_api.html) + via the [Process Function](../dev/stream/operators/process_function.html). It allows users to process events from one or more streams, + and use consistent fault tolerant *state*. Users can register event time and processing time callbacks, allowing programs to realize sophisticated computations. - - In practice, most applications would not need the above described low level abstraction, but would instead program against the + - In practice, most applications would not need the low level abstraction describe above, but would instead program against the **Core APIs** like the [DataStream API](../dev/datastream_api.html) (bounded/unbounded streams) and the [DataSet API](../dev/batch/index.html) - (bounded data sets). These fluent APIs offer the common building blocks for data processing, like various forms of user-specified + (bounded data sets). These fluent APIs offer the common building blocks for data processing, like forms of user-specified transformations, joins, aggregations, windows, state, etc. Data types processed in these APIs are represented as classes - in the respective programming languages. + in respective programming languages. - The low level *Process Function* integrates with the *DataStream API*, making it possible to go the lower level abstraction - for certain operations only. The *DataSet API* offers additional primitives on bounded data sets, like loops/iterations. + The low level *Process Function* integrates with the *DataStream API*, making it possible to use the lower level abstraction + for certain operations. The *DataSet API* offers additional primitives on bounded data sets, like loops or iterations. - The **Table API** is a declarative DSL centered around *tables*, which may be dynamically changing tables (when representing streams). - The [Table API](../dev/table_api.html) follows the (extended) relational model: Tables have a schema attached (similar to tables in relational databases) + The [Table API](../dev/table_api.html) follows the (extended) relational model. Tables have a schema attached (similar to tables in relational databases) and the API offers comparable operations, such as select, project, join, group-by, aggregate, etc. - Table API programs declaratively define *what logical operation should be done* rather than specifying exactly - *how the code for the operation looks*. Though the Table API is extensible by various types of user-defined + Table API programs declaratively define *what logical operation should to perform* rather than specifying + *how the code for the operation looks*. The Table API is extensible by various types of user-defined functions, it is less expressive than the *Core APIs*, but more concise to use (less code to write). - In addition, Table API programs also go through an optimizer that applies optimization rules before execution. + Table API programs also go through an optimizer that applies optimization rules before execution. - One can seamlessly convert between tables and *DataStream*/*DataSet*, allowing programs to mix *Table API* and with the *DataStream* + You can seamlessly convert between tables and *DataStream*/*DataSet*, allowing programs to mix *Table API* and with the *DataStream* and *DataSet* APIs. - The highest level abstraction offered by Flink is **SQL**. This abstraction is similar to the *Table API* both in semantics and expressiveness, but represents programs as SQL query expressions. - The [SQL](../dev/table_api.html#sql) abstraction closely interacts with the Table API, and SQL queries can be executed over tables defined in the *Table API*. + The [SQL](../dev/table_api.html#sql) abstraction closely interacts with the Table API, and you can execute SQL queries over tables defined in the *Table API*. ## Programs and Dataflows -The basic building blocks of Flink programs are **streams** and **transformations**. (Note that the -DataSets used in Flink's DataSet API are also streams internally -- more about that -later.) Conceptually a *stream* is a (potentially never-ending) flow of data records, and a *transformation* is an +The basic building blocks of Flink programs are **streams** and **transformations**. The +DataSets used in Flink's DataSet API are also streams internally, which this document will cover later. Conceptually a *stream* is a (potentially never-ending) flow of data records, and a *transformation* is an --- End diff -- Needs line break.
---