[ 
https://issues.apache.org/jira/browse/CASSANDRA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462097#comment-13462097
 ] 

Sylvain Lebresne commented on CASSANDRA-4449:
---------------------------------------------

bq. Actually I don't see a reason to use something as heavyweight as MD5.

The advantage of using a hash of the query string as ID is that you only ever 
store one prepared statement for a given query. Which does save memory in 
practice because a node will be connected by many clients that will usually all 
prepare the same set of queries. It also give you some protection against 
buggy/crappy clients that re-prepared the same query again and again, though 
that's a more minor point. As for the heavyweightness of MD5, I don't think 
this matters in the case of prepared statements.


                
> Make prepared statement global rather than connection based
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-4449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4449
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: binary_protocol
>             Fix For: 1.2.0 beta 2
>
>         Attachments: 4449.txt, 4449-v2.txt
>
>
> Currently, prepared statements are connection based. A client can only use a 
> prepared statement on the connection it prepared it on, and if you prepare 
> the same prepared statement on multiple connections, we'll keep multiple 
> times the same prepared statement. This is potentially inefficient but can 
> also be fairly painful for client libraries with pool of connections (a.k.a 
> all reasonable client library ever) as this means you need to make sure you 
> prepare statement on every connection of the pool, including the connection 
> that don't exist yet but might be created later.
> This ticket suggests making prepared statement global (at least for CQL3), 
> i.e. move them out of ClientState. This will likely reduce the number of 
> stored statement on a given node quite a bit, since it's very likely that all 
> clients to a given node will prepare the same statements (and potentially on 
> all of their connection with the node). And given that prepared statement 
> identifiers are the hashCode() of the string, this should be fairly trivial.
> I will note that while I think using a hash of the string as identifier is a 
> very good idea, I don't know if the default java hashCode() is good enough. 
> If that's a concern, maybe we should use a safer (bug longer) hash like md5 
> or sha1. But we'd better do that now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to