[ 
https://issues.apache.org/jira/browse/PHOENIX-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301559#comment-15301559
 ] 

James Taylor commented on PHOENIX-2941:
---------------------------------------

The new UPDATE_CACHE_FREQUENCY property[1] available in 4.7 greatly reduces the 
RPC traffic, but I think there are some simple changes we can make that'll 
propagate schema changes in an acceptable manner:
- Set a reasonable default UPDATE_CACHE_FREQUENCY default value for tables (~5 
minutes).
- When any MetaDataEntityNotFoundException exception is thrown, make an RPC to 
get the latest table definition. This would cover the case of one client adding 
a table, column, view, schema, sequence, and hinted index with another client 
not having this info available in it's client-side cache.
- This would not cover one client dropping a column or view and another client 
accessing it (if the second client has the information cached) until the 
metadata expires from the client-side cache. It would cover the case of a table 
being dropped, since we get an HBase exception in this case. In my experience, 
dropping metadata is not that common, as it causes b/w compat issues (often 
times this operations would be disallowed on production systems). Even if 
writes to deleted columns occur, it typically doesn't cause harm as the data 
won't be retrievable once the cache expires. FWIW, transactional tables have a 
different means of propagating metadata changes, relying on the transaction 
manager and some read/write fences, so we can handle the drop through this 
mechanism in this case.
- Get rid of the server-side metadata cache completely
- Make the SYSTEM.CATALOG table transactional
- Have a separate SYSTEM.VIEW table that stores views

The last two items are somewhat orthogonal, but they're enabled by getting rid 
of the server-side metadata cache.

Thoughts?

[1] https://phoenix.apache.org/#Altering

> Alternative means of propagating schema changes
> -----------------------------------------------
>
>                 Key: PHOENIX-2941
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2941
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Nick Dimiduk
>
> The current approach to propagating schema changes (ie, add column) involves 
> maintaining a 
> [GlobalCache|https://github.com/apache/phoenix/blob/10909ae502095bac775d98e6d92288c5cad9b9a6/phoenix-core/src/main/java/org/apache/phoenix/cache/GlobalCache.java]
>  of table schema on both clients and in RS coprocessors. This schema 
> information is versioned, and query timestamp is used to determine when the 
> cache is considered stale and needs updated. This causes problems for users 
> who specify a timestamp either via connection settings (ie, PHOENIX-2607) or 
> using the ROW_TIMESTAMP feature. Presumably this will also negatively impact 
> users of the Tephra transaction system as it uses the cell timestamp to store 
> transaction id.
> We need some other means of propagating schema changes throughout the 
> cluster. One approach might be a ZK node for each table that can notify 
> coprocessors (and clients?) that their cache is stale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to