[jira] [Updated] (FLINK-13479) Cassandra POJO Sink - Prepared Statement query does not have deterministic ordering of columns - causing prepared statement cache overflow

Flink Jira Bot (Jira) Sun, 06 Jun 2021 15:49:30 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot updated FLINK-13479:
-----------------------------------
    Labels: auto-unassigned pull-request-available stale-major  (was: 
auto-unassigned pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Cassandra POJO Sink - Prepared Statement query does not have deterministic 
> ordering of columns - causing prepared statement cache overflow
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-13479
>                 URL: https://issues.apache.org/jira/browse/FLINK-13479
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Cassandra
>    Affects Versions: 1.7.2
>            Reporter: Ronak Thakrar
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available, stale-major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> While using Cassandra POJO Sink as part of Flink Jobs - prepared statements 
> query string which is automatically generated while inserting the data(using 
> Mapper.saveQuery method), Cassandra entity does not have deterministic 
> ordering enforced-so every time column position is changed a new prepared 
> statement is generated and used.  As an effect of that prepared statement 
> query cache is overflown because every time when insert statement query 
> string is generated by - columns are in random order. 
> Following is the detailed explanation for what happens inside the Datastax 
> java driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
> The current Mapper uses random ordering of columns when it creates prepared 
> queries. This is fine when only 1 java client is accessing a cluster (and 
> assuming the application developer does the correct thing by re-using a 
> Mapper), since each Mapper will reused prepared statement. However when you 
> have many java clients accessing a cluster, they will each create their own 
> permutations of column ordering, and can thrash the prepared statement cache 
> on the cluster.
> I propose that the Mapper uses a TreeMap instead of a HashMap when it builds 
> its set of AliasedMappedProperty - sorted by the column name 
> (col.mappedProperty.getMappedName()). This would create a deterministic 
> ordering of columns, and all java processes accessing the same cluster would 
> end up with the same prepared queries for the same entities.
> This issue is already fixed in the Datastax java driver update version(3.3.1) 
> which is not used by Flink Cassandra connector (using 3.0.0).
> I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector 
> and tested, it stopped creating new prepared statements with different 
> ordering of column for the same entity. I have the fix for this issue and 
> would like to contribute the change and will raise the PR request for the 
> same. 
> Flink Cassandra Connector Version: flink-connector-cassandra_2.11
> Flink Version: 1.7.1
> I am creating PR request for the same and which can be merged accordingly and 
> re released in new minor release or patch release as required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-13479) Cassandra POJO Sink - Prepared Statement query does not have deterministic ordering of columns - causing prepared statement cache overflow

Reply via email to