[jira] [Commented] (FLINK-3656) Rework Table API tests

Fabian Hueske (JIRA) Thu, 16 Nov 2017 02:25:44 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255075#comment-16255075
 ]


Fabian Hueske commented on FLINK-3656:
--------------------------------------

There's still a bit of duplication, but I agree. We can close this parent issue 
for now.

> Rework Table API tests
> ----------------------
>
>                 Key: FLINK-3656
>                 URL: https://issues.apache.org/jira/browse/FLINK-3656
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Vasia Kalavri
>              Labels: starter
>
> The {{flink-table}} component consists of 
> several APIs 
>       * Scala-embedded Table API
>       * String-based Table API (for Java)
>       * SQL 
> and compiles to two execution backends:
>       * DataStream API
>       * DataSet API
> There are many different translation paths involved until a query is executed:
>       # Table API String -> Table API logical plan
>       # Table API Scala-expressions -> Table API logical plan
>       # Table API logical plan -> Calcite RelNode plans
>       # SQL -> Calcite RelNode plans (done by exclusively via Calcite)
>       # Calcite RelNodes -> DataSet RelNodes
>       # DataSet RelNodes -> DataSet program
>       # Calcite RelNodes -> DataStream RelNodes
>       # DataStream RelNodes -> DataStream program
>       # Calcite RexNode expressions -> generated code
> which need to be thoroughly tested.
> Initially, many tests were done as end-to-end integration tests with high 
> overhead.
> However, due to the combinations of APIs and execution back-ends, this 
> approach causes many redundant tests and long build times.
> Therefore, I propose the following testing scheme:
> 1. Table API String -> Table API expression: 
> The String-based Table API is tested by comparing the resulting logical plan 
> (Table.logicalPlan) to the logical plan of an equivalent Table program that 
> uses the Scala-embedded syntax. The logical plan is the Table API internal 
> representation which is later converted into a Calcite RelNode plan.
> All existing integration tests that check the "Java" Table API should be 
> ported to unit tests. There will also be duplicated tests because, the Java 
> Table API is tested for batch and streaming which is not necessary anymore.
> 2. Table API Scala-expressions -> Table API logical plan -> Calcite RelNodes 
> -> DataSet RelNodes / DataStream RelNodes
> These tests cover the translation and optimization of Table API queries and 
> verify the Calcite optimized plan. We need distinct tests for DataSet and 
> DataStream environments since features and translation rules vary. These test 
> will also identify if added or modified rules or cost functions result in 
> different plans. These should be the main tests for the Table API and very 
> extensive. 
> These tests should be implemented by extending the {{TableTestBase}} which is 
> a base class for unit tests and hence very lightweight.
> 3. SQL -> Calcite RelNodes -> DataSet RelNodes / DataStream RelNodes
> These are the same tests as described for 2. (Table API Scala-expressions -> 
> DataSet / DataStream RelNodes) but just for SQL.
> 4. DataSet RelNode -> DataSet program
> Unfortunately, the DataSet API lacks a good mechanism to test generated 
> programs, i.e., get a plan traversable of all operators with access to all 
> user-defined functions. Until such a testing utility is available, I propose 
> to test the translation to DataSet programs as end-to-end integration tests. 
> However, I think we can run most tests on a Collection ExecutionEnvironment, 
> which does not start a Flink cluster but runs all code on Java collections. 
> This makes these tests much more lightweight than cluster-based ITCases. The 
> goal of these tests should be to cover all translation paths from DataSetRel 
> to DataSet program, i.e., all DataSetRel nodes and their translation logic. 
> These tests should be implemented by extending the 
> {{TableProgramsCollectionTestBase}} (see FLINK-5268).
> Moreover, we should have very few cluster-based ITCases in place that check 
> the execution path with the actual operators, serializers, and comparators. 
> However, we should limit these tests to the minimum to keep build time low. 
> These tests should be implemented by extending the 
> {{TableProgramsClusterTestBase}} (FLINK-5268) and all be located in the same 
> class to avoid repeated instantiation of the Flink MiniCluster.
> 5. DataStream RelNode -> DataStream program
> Here basically the same applies as for the DataSet programs. I'm not aware of 
> a good way to test generated DataStream programs without executing them. A 
> testing utility would be great for all libraries that are built on top of the 
> API. Until then, I propose to use end-to-end integration tests. 
> Unfortunately, the DataStream API does not feature a collection execution 
> mode, so all tests need to be run on a MiniCluster. Therefore, we should 
> again keep these tests to the minimum. These tests should be implemented by 
> extending the {{StreamingMultipleProgramsTestBase}} and be located in few 
> classes to avoid repeated instantiations of the FLink MiniCluster.
> 6. (Scala expressions | String-parsed expressions | SQL expressions) -> 
> RexNode expressions -> Generated Code
> In order to avoid extensive optimization tests for each supported expression 
> or built-in function, we have the {{ExpressionTestBase}} which compiles 
> expressions into generated code and tests for the correctness of results. All 
> supported expressions and built-in function should be tested by extending the 
> {{ExpressionTestBase}} instead of running a full integration test.
> I will add a few JIRAs to migrate existing tests to the new testing scheme.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-3656) Rework Table API tests

Reply via email to