[
https://issues.apache.org/jira/browse/FLINK-38846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-38846:
-----------------------------------
Labels: improvement pull-request-available (was: improvement)
> Binlog client use LinkedList to store binlog content
> ----------------------------------------------------
>
> Key: FLINK-38846
> URL: https://issues.apache.org/jira/browse/FLINK-38846
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Reporter: 张皓然
> Priority: Minor
> Labels: improvement, pull-request-available
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> shyiko BinaryLogClient uses WriteRowsEventDataDeserializer and
> DeleteRowsEventDataDeserializer to deserialize binlog event produced by
> INSERT or DELETE SQL
> Don't know why but both of the deserializer is using *LinkedList* to store
> deserialized rows
>
> {code:java}
> private List<Serializable[]> deserializeRows(long tableId, BitSet
> includedColumns, ByteArrayInputStream inputStream)
> throws IOException {
> List<Serializable[]> result = new LinkedList<Serializable[]>();
> while (inputStream.available() > 0) {
> result.add(deserializeRow(tableId, includedColumns, inputStream));
> }
> return result;
> } {code}
> In Debezium MySqlStreamingChangeEventSource, handleChange() uses *Index-based
> for loop traversal* to visit every single row provided by
> EventDataDeserializer ahead
> {code:java}
> for (int row = startingRowNumber; row != numRows; ++row) {
> offsetContext.setRowNumber(row, numRows);
> offsetContext.event(tableId, eventTimestamp);
> changeEmitter.emit(tableId, rows.get(row));
> count++;
> } {code}
> It is clear that the time complexity of this traversal is O(n^2), make the
> cdc slower and slower when there is a big transaction
> It is recommend to repace LinkedList with ArrayList, I'd like to do this
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)