[ 
https://issues.apache.org/jira/browse/FLINK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13479:
-----------------------------------
      Labels: auto-deprioritized-major auto-unassigned pull-request-available  
(was: auto-unassigned pull-request-available stale-major)
    Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Cassandra POJO Sink - Prepared Statement query does not have deterministic 
> ordering of columns - causing prepared statement cache overflow
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-13479
>                 URL: https://issues.apache.org/jira/browse/FLINK-13479
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Cassandra
>    Affects Versions: 1.7.2
>            Reporter: Ronak Thakrar
>            Priority: Minor
>              Labels: auto-deprioritized-major, auto-unassigned, 
> pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> While using Cassandra POJO Sink as part of Flink Jobs - prepared statements 
> query string which is automatically generated while inserting the data(using 
> Mapper.saveQuery method), Cassandra entity does not have deterministic 
> ordering enforced-so every time column position is changed a new prepared 
> statement is generated and used.  As an effect of that prepared statement 
> query cache is overflown because every time when insert statement query 
> string is generated by - columns are in random order. 
> Following is the detailed explanation for what happens inside the Datastax 
> java driver([https://datastax-oss.atlassian.net/browse/JAVA-1587]):
> The current Mapper uses random ordering of columns when it creates prepared 
> queries. This is fine when only 1 java client is accessing a cluster (and 
> assuming the application developer does the correct thing by re-using a 
> Mapper), since each Mapper will reused prepared statement. However when you 
> have many java clients accessing a cluster, they will each create their own 
> permutations of column ordering, and can thrash the prepared statement cache 
> on the cluster.
> I propose that the Mapper uses a TreeMap instead of a HashMap when it builds 
> its set of AliasedMappedProperty - sorted by the column name 
> (col.mappedProperty.getMappedName()). This would create a deterministic 
> ordering of columns, and all java processes accessing the same cluster would 
> end up with the same prepared queries for the same entities.
> This issue is already fixed in the Datastax java driver update version(3.3.1) 
> which is not used by Flink Cassandra connector (using 3.0.0).
> I upgraded the driver version to 3.3.1 locally in Flink Cassandra connector 
> and tested, it stopped creating new prepared statements with different 
> ordering of column for the same entity. I have the fix for this issue and 
> would like to contribute the change and will raise the PR request for the 
> same. 
> Flink Cassandra Connector Version: flink-connector-cassandra_2.11
> Flink Version: 1.7.1
> I am creating PR request for the same and which can be merged accordingly and 
> re released in new minor release or patch release as required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to