[
https://issues.apache.org/jira/browse/FLINK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ronak Thakrar updated FLINK-13479:
----------------------------------
Description:
While using Cassandra POJO Sink as part of Flink Jobs - prepared statements
query string which is automatically generated while inserting the data(using
Mapper.saveQuery method), Cassandra entity does not have deterministic ordering
enforced-so every time column position is changed a new prepared statement is
generated and used. As an effect of that prepared statement query cache is
overflown because every time when insert statement query string is generated by
- columns are in random order.
Following is the detailed explanation for what happens inside the Datastax java
driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
The current Mapper uses random ordering of columns when it creates prepared
queries. This is fine when only 1 java client is accessing a cluster (and
assuming the application developer does the correct thing by re-using a
Mapper), since each Mapper will reused prepared statement. However when you
have many java clients accessing a cluster, they will each create their own
permutations of column ordering, and can thrash the prepared statement cache on
the cluster.
I propose that the Mapper uses a TreeMap instead of a HashMap when it builds
its set of AliasedMappedProperty - sorted by the column name
(col.mappedProperty.getMappedName()). This would create a deterministic
ordering of columns, and all java processes accessing the same cluster would
end up with the same prepared queries for the same entities.
This issue is already fixed in the Datastax java driver update version(3.3.1)
which is not used by Flink Cassandra connector (using 3.0.0).
I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector and
tested, it stopped creating new prepared statements with different ordering of
column for the same entity. I have the fix for this issue and would like to
contribute the change and will raise the PR request for the same.
Flink Cassandra Connector Version: flink-connector-cassandra_2.11
Flink Version: 1.7.1
I am creating PR request for the same and which can be merged accordingly and
re released in new minor release or patch release as required.
was:
While using Cassandra POJO Sink as part of Flink Jobs - prepared statements
query string which is automatically generated while inserting the data(using
Mapper.saveQuery method), Cassandra entity does not have deterministic ordering
enforced-so every time column position is changed a new prepared statement is
generated and used. As an effect of that prepared statement query cache is
overflown because every time when insert statement query string is generated by
- columns are in random order.
Following is the detailed explanation for what happens inside the Datastax java
driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
The current Mapper uses random ordering of columns when it creates prepared
queries. This is fine when only 1 java client is accessing a cluster (and
assuming the application developer does the correct thing by re-using a
Mapper), since each Mapper will reused prepared statement. However when you
have many java clients accessing a cluster, they will each create their own
permutations of column ordering, and can thrash the prepared statement cache on
the cluster.
I propose that the Mapper uses a TreeMap instead of a HashMap when it builds
its set of AliasedMappedProperty - sorted by the column name
(col.mappedProperty.getMappedName()). This would create a deterministic
ordering of columns, and all java processes accessing the same cluster would
end up with the same prepared queries for the same entities.
This issue is already fixed in the Datastax java driver update version(3.3.1)
which is not used by Flink Cassandra connector (using 3.0.0).
I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector and
tested, it stopped creating new prepared statements with different ordering of
column for the same entity. I have the fix for this issue and would like to
contribute the change and will raise the PR request for the same.
Flink Cassandra Connector Version: flink-connector-cassandra_2.11
Flink Version: 1.7.1
> Cassandra POJO Sink - Prepared Statement query does not have deterministic
> ordering of columns - causing prepared statement cache overflow
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-13479
> URL: https://issues.apache.org/jira/browse/FLINK-13479
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Cassandra
> Reporter: Ronak Thakrar
> Priority: Major
>
> While using Cassandra POJO Sink as part of Flink Jobs - prepared statements
> query string which is automatically generated while inserting the data(using
> Mapper.saveQuery method), Cassandra entity does not have deterministic
> ordering enforced-so every time column position is changed a new prepared
> statement is generated and used. As an effect of that prepared statement
> query cache is overflown because every time when insert statement query
> string is generated by - columns are in random order.
> Following is the detailed explanation for what happens inside the Datastax
> java driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
> The current Mapper uses random ordering of columns when it creates prepared
> queries. This is fine when only 1 java client is accessing a cluster (and
> assuming the application developer does the correct thing by re-using a
> Mapper), since each Mapper will reused prepared statement. However when you
> have many java clients accessing a cluster, they will each create their own
> permutations of column ordering, and can thrash the prepared statement cache
> on the cluster.
> I propose that the Mapper uses a TreeMap instead of a HashMap when it builds
> its set of AliasedMappedProperty - sorted by the column name
> (col.mappedProperty.getMappedName()). This would create a deterministic
> ordering of columns, and all java processes accessing the same cluster would
> end up with the same prepared queries for the same entities.
> This issue is already fixed in the Datastax java driver update version(3.3.1)
> which is not used by Flink Cassandra connector (using 3.0.0).
> I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector
> and tested, it stopped creating new prepared statements with different
> ordering of column for the same entity. I have the fix for this issue and
> would like to contribute the change and will raise the PR request for the
> same.
> Flink Cassandra Connector Version: flink-connector-cassandra_2.11
> Flink Version: 1.7.1
> I am creating PR request for the same and which can be merged accordingly and
> re released in new minor release or patch release as required.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)