Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3397#discussion_r104446109
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
 ---
    @@ -93,6 +94,41 @@ object AggregateUtil {
       }
     
       /**
    +    * Create an [[RichProcessFunction]] to evaluate final aggregate value.
    +    *
    +    * @param namedAggregates List of calls to aggregate functions and 
their output field names
    +    * @param inputType Input row type
    +    * @param outputType Output row type
    +    * @param forwardedFields All the forwarded fields
    +    * @return [[UnboundedProcessingOverProcessFunction]]
    +    */
    +  private[flink] def CreateUnboundedProcessingOverProcessFunction(
    +    namedAggregates: Seq[CalcitePair[AggregateCall, String]],
    +    inputType: RelDataType,
    +    outputType: RelDataType,
    +    forwardedFields: Array[Int]): UnboundedProcessingOverProcessFunction = 
{
    +
    +    val (aggFields, aggregates) =
    +      transformToAggregateFunctions(
    +        namedAggregates.map(_.getKey),
    +        inputType,
    +        forwardedFields.length)
    +
    +    val rowTypeInfo = new RowTypeInfo(outputType.getFieldList
    +      .map(field => FlinkTypeFactory.toTypeInfo(field.getType)): _*)
    +
    +    val intermediateRowType: RowTypeInfo =
    +      createDataSetAggregateBufferDataType(forwardedFields, aggregates, 
inputType)
    --- End diff --
    
    This will actually cause a serialization error because the state type in 
`UnboundedProcessingOverProcessFunction` does not match the type info. The 
reason why the tests pass is that the default state backend keeps all data on 
the heap and does not serialize. We need to extend the test to use the 
RocksDBStatebackend to capture these cases. This is the first operator (besides 
the built-in windows which are tested in the DataStream API) which uses state.
    
    I suggest the following. We add a `StreamingWithStateTestBase` to the Table 
API utils which is defined as:
    ```
    class StreamingWithStateTestBase extends StreamingMultipleProgramsTestBase {
    
      val _tempFolder = new TemporaryFolder
    
      @Rule
      def tempFolder: TemporaryFolder = _tempFolder
    
      def getStateBackend: RocksDBStateBackend = {
        val dbPath = tempFolder.newFolder().getAbsolutePath
        val checkpointPath = tempFolder.newFolder().toURI.toString
        val backend = new RocksDBStateBackend(checkpointPath)
        backend.setDbStoragePath(dbPath)
        backend
      }
    }
    ```
    
    In tests which require state, we extend the ITCase class from 
`StreamingWithStateTestBase` instead of `StreamingMultipleProgramsTestBase` and 
set the state backend in each relevant method as 
`env.setStateBackend(getStateBackend)`. The test will then use 
RocksDBStateBackend and check for proper serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to