[jira] [Commented] (FLINK-5803) Add [partitioned] processing time OVER RANGE BETWEEN UNBOUNDED PRECEDING aggregation to SQL

ASF GitHub Bot (JIRA) Mon, 06 Mar 2017 08:38:15 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897616#comment-15897616
 ]


ASF GitHub Bot commented on FLINK-5803:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3397#discussion_r104446109
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
 ---
    @@ -93,6 +94,41 @@ object AggregateUtil {
       }
     
       /**
    +    * Create an [[RichProcessFunction]] to evaluate final aggregate value.
    +    *
    +    * @param namedAggregates List of calls to aggregate functions and 
their output field names
    +    * @param inputType Input row type
    +    * @param outputType Output row type
    +    * @param forwardedFields All the forwarded fields
    +    * @return [[UnboundedProcessingOverProcessFunction]]
    +    */
    +  private[flink] def CreateUnboundedProcessingOverProcessFunction(
    +    namedAggregates: Seq[CalcitePair[AggregateCall, String]],
    +    inputType: RelDataType,
    +    outputType: RelDataType,
    +    forwardedFields: Array[Int]): UnboundedProcessingOverProcessFunction = 
{
    +
    +    val (aggFields, aggregates) =
    +      transformToAggregateFunctions(
    +        namedAggregates.map(_.getKey),
    +        inputType,
    +        forwardedFields.length)
    +
    +    val rowTypeInfo = new RowTypeInfo(outputType.getFieldList
    +      .map(field => FlinkTypeFactory.toTypeInfo(field.getType)): _*)
    +
    +    val intermediateRowType: RowTypeInfo =
    +      createDataSetAggregateBufferDataType(forwardedFields, aggregates, 
inputType)
    --- End diff --
    
    This will actually cause a serialization error because the state type in 
`UnboundedProcessingOverProcessFunction` does not match the type info. The 
reason why the tests pass is that the default state backend keeps all data on 
the heap and does not serialize. We need to extend the test to use the 
RocksDBStatebackend to capture these cases. This is the first operator (besides 
the built-in windows which are tested in the DataStream API) which uses state.
    
    I suggest the following. We add a `StreamingWithStateTestBase` to the Table 
API utils which is defined as:
    ```
    class StreamingWithStateTestBase extends StreamingMultipleProgramsTestBase {
    
      val _tempFolder = new TemporaryFolder
    
      @Rule
      def tempFolder: TemporaryFolder = _tempFolder
    
      def getStateBackend: RocksDBStateBackend = {
        val dbPath = tempFolder.newFolder().getAbsolutePath
        val checkpointPath = tempFolder.newFolder().toURI.toString
        val backend = new RocksDBStateBackend(checkpointPath)
        backend.setDbStoragePath(dbPath)
        backend
      }
    }
    ```
    
    In tests which require state, we extend the ITCase class from 
`StreamingWithStateTestBase` instead of `StreamingMultipleProgramsTestBase` and 
set the state backend in each relevant method as 
`env.setStateBackend(getStateBackend)`. The test will then use 
RocksDBStateBackend and check for proper serialization.


> Add [partitioned] processing time OVER RANGE BETWEEN UNBOUNDED PRECEDING 
> aggregation to SQL
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5803
>                 URL: https://issues.apache.org/jira/browse/FLINK-5803
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>
> The goal of this issue is to add support for OVER RANGE aggregations on 
> processing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT 
>   a, 
>   SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN UNBOUNDED 
> PRECEDING AND CURRENT ROW) AS sumB,
>   MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN UNBOUNDED 
> PRECEDING AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The ORDER BY clause may only have procTime() as parameter. procTime() is a 
> parameterless scalar function that just indicates processing time mode.
> - bounded PRECEDING is not supported (see FLINK-5654)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some 
> of the restrictions are trivial to address, we can add the functionality in 
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with 
> RexOver expression).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5803) Add [partitioned] processing time OVER RANGE BETWEEN UNBOUNDED PRECEDING aggregation to SQL

Reply via email to