[
https://issues.apache.org/jira/browse/FLINK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-13479:
-----------------------------------
Labels: auto-deprioritized-major auto-unassigned pull-request-available
(was: auto-unassigned pull-request-available stale-major)
Priority: Minor (was: Major)
This issue was labeled "stale-major" 7 ago and has not received any updates so
it is being deprioritized. If this ticket is actually Major, please raise the
priority and ask a committer to assign you the issue or revive the public
discussion.
> Cassandra POJO Sink - Prepared Statement query does not have deterministic
> ordering of columns - causing prepared statement cache overflow
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-13479
> URL: https://issues.apache.org/jira/browse/FLINK-13479
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Cassandra
> Affects Versions: 1.7.2
> Reporter: Ronak Thakrar
> Priority: Minor
> Labels: auto-deprioritized-major, auto-unassigned,
> pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> While using Cassandra POJO Sink as part of Flink Jobs - prepared statements
> query string which is automatically generated while inserting the data(using
> Mapper.saveQuery method), Cassandra entity does not have deterministic
> ordering enforced-so every time column position is changed a new prepared
> statement is generated and used. As an effect of that prepared statement
> query cache is overflown because every time when insert statement query
> string is generated by - columns are in random order.
> Following is the detailed explanation for what happens inside the Datastax
> java driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
> The current Mapper uses random ordering of columns when it creates prepared
> queries. This is fine when only 1 java client is accessing a cluster (and
> assuming the application developer does the correct thing by re-using a
> Mapper), since each Mapper will reused prepared statement. However when you
> have many java clients accessing a cluster, they will each create their own
> permutations of column ordering, and can thrash the prepared statement cache
> on the cluster.
> I propose that the Mapper uses a TreeMap instead of a HashMap when it builds
> its set of AliasedMappedProperty - sorted by the column name
> (col.mappedProperty.getMappedName()). This would create a deterministic
> ordering of columns, and all java processes accessing the same cluster would
> end up with the same prepared queries for the same entities.
> This issue is already fixed in the Datastax java driver update version(3.3.1)
> which is not used by Flink Cassandra connector (using 3.0.0).
> I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector
> and tested, it stopped creating new prepared statements with different
> ordering of column for the same entity. I have the fix for this issue and
> would like to contribute the change and will raise the PR request for the
> same.
> Flink Cassandra Connector Version: flink-connector-cassandra_2.11
> Flink Version: 1.7.1
> I am creating PR request for the same and which can be merged accordingly and
> re released in new minor release or patch release as required.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)