张皓然 created FLINK-38846:
---------------------------
Summary: Binlog client use LinkedList to store binlog content
Key: FLINK-38846
URL: https://issues.apache.org/jira/browse/FLINK-38846
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Reporter: 张皓然
shyiko BinaryLogClient uses WriteRowsEventDataDeserializer and
DeleteRowsEventDataDeserializer to deserialize binlog event produced by INSERT
or DELETE SQL
Don't know why but both of the deserializer is using *LinkedList* to store
deserialized rows
{code:java}
private List<Serializable[]> deserializeRows(long tableId, BitSet
includedColumns, ByteArrayInputStream inputStream)
throws IOException {
List<Serializable[]> result = new LinkedList<Serializable[]>();
while (inputStream.available() > 0) {
result.add(deserializeRow(tableId, includedColumns, inputStream));
}
return result;
} {code}
In Debezium MySqlStreamingChangeEventSource, handleChange() uses *Index-based
for loop traversal* to visit every single row provided by EventDataDeserializer
ahead
{code:java}
for (int row = startingRowNumber; row != numRows; ++row) {
offsetContext.setRowNumber(row, numRows);
offsetContext.event(tableId, eventTimestamp);
changeEmitter.emit(tableId, rows.get(row));
count++;
} {code}
It is clear that the time complexity of this traversal is O(n^2), make the cdc
slower and slower when there is a big transaction
It is recommend to repace LinkedList with ArrayList and cound, I'd like to do
this
--
This message was sent by Atlassian Jira
(v8.20.10#820010)