Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/3397#discussion_r104446109
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
---
@@ -93,6 +94,41 @@ object AggregateUtil {
}
/**
+ * Create an [[RichProcessFunction]] to evaluate final aggregate value.
+ *
+ * @param namedAggregates List of calls to aggregate functions and
their output field names
+ * @param inputType Input row type
+ * @param outputType Output row type
+ * @param forwardedFields All the forwarded fields
+ * @return [[UnboundedProcessingOverProcessFunction]]
+ */
+ private[flink] def CreateUnboundedProcessingOverProcessFunction(
+ namedAggregates: Seq[CalcitePair[AggregateCall, String]],
+ inputType: RelDataType,
+ outputType: RelDataType,
+ forwardedFields: Array[Int]): UnboundedProcessingOverProcessFunction =
{
+
+ val (aggFields, aggregates) =
+ transformToAggregateFunctions(
+ namedAggregates.map(_.getKey),
+ inputType,
+ forwardedFields.length)
+
+ val rowTypeInfo = new RowTypeInfo(outputType.getFieldList
+ .map(field => FlinkTypeFactory.toTypeInfo(field.getType)): _*)
+
+ val intermediateRowType: RowTypeInfo =
+ createDataSetAggregateBufferDataType(forwardedFields, aggregates,
inputType)
--- End diff --
This will actually cause a serialization error because the state type in
`UnboundedProcessingOverProcessFunction` does not match the type info. The
reason why the tests pass is that the default state backend keeps all data on
the heap and does not serialize. We need to extend the test to use the
RocksDBStatebackend to capture these cases. This is the first operator (besides
the built-in windows which are tested in the DataStream API) which uses state.
I suggest the following. We add a `StreamingWithStateTestBase` to the Table
API utils which is defined as:
```
class StreamingWithStateTestBase extends StreamingMultipleProgramsTestBase {
val _tempFolder = new TemporaryFolder
@Rule
def tempFolder: TemporaryFolder = _tempFolder
def getStateBackend: RocksDBStateBackend = {
val dbPath = tempFolder.newFolder().getAbsolutePath
val checkpointPath = tempFolder.newFolder().toURI.toString
val backend = new RocksDBStateBackend(checkpointPath)
backend.setDbStoragePath(dbPath)
backend
}
}
```
In tests which require state, we extend the ITCase class from
`StreamingWithStateTestBase` instead of `StreamingMultipleProgramsTestBase` and
set the state backend in each relevant method as
`env.setStateBackend(getStateBackend)`. The test will then use
RocksDBStateBackend and check for proper serialization.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---