[ 
https://issues.apache.org/jira/browse/PHOENIX-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471710#comment-16471710
 ] 

Chinmay Kulkarni edited comment on PHOENIX-3955 at 5/11/18 9:58 AM:
--------------------------------------------------------------------

Hey [~jamestaylor], [~samarthjain] [~tdsilva]
 Here are some points on achieving this along with some questions I have:
 Let's take a simple example. Say I create the base data table with the 
following query:
{code:sql}
CREATE TABLE IF NOT EXISTS z_base_table (
id INTEGER not null primary key, CF1.host VARCHAR(10),flag BOOLEAN) 
TTL=120000,CF1.KEEP_DELETED_CELLS='true',REPLICATION_SCOPE='1';
{code}
We have the following paths to consider:

1. Create Index code path:
 * *Case1: We create the data table with specific column families and there is 
no default CF*:
 In this case, the global index table's default CF and the CFs corresponding to 
all local indexes should have default values for REPLICATION_SCOPE and 
KEEP_DELETED_CELLS as they do now, BUT they should inherit the TTL property 
from the non-local index CFs. In this case, it should be sufficient to check 
any non-local index CF's TTL since they are enforced to all be the same.

 * *Case2: The data table has a default CF*:
 In this case, the global index table's default CF and the CFs corresponding to 
all local indexes should inherit REPLICATION_SCOPE, KEEP_DELETED_CELLS and the 
TTL property from the data table's default CF.

 * *Question 1*: If we create an index with its own properties, say something 
like:
{code:sql}
CREATE INDEX diff_properties_z_index ON z_base_table(host) 
TTL=5000,KEEP_DELETED_CELLS='true';
{code}
We override the data table properties making the index tables and data table 
properties out of sync. This JIRA might set expectations that these properties 
are always in sync between index tables and the data table, so should we 
disallow this henceforth? At the very least we may want to log that the index 
table and data table properties will be out of sync after executing this query.

 * *Question 1.1*: Given the above situation, if we later on alter the data 
table, should we blindly also alter the properties of the index tables (given 
that we want them to be in sync), or only alter index table properties in case 
they are equivalent to the data table properties?

 * "Create index code path" changes should be achievable by changes in 
_CQSI.generateTableDescriptor_ before we apply specific properties of the index 
tables themselves.

2. Alter table set <TTL/REPLICATION_SCOPE/KEEP_DELETED_CELLS> code path:
 * Here we can keep track of properties to be applied to 
_QueryConstants.ALL_FAMILY_PROPERTIES_KEY_ and not to specific CFs. In case we 
are changing TTL, REPLICATION_SCOPE or KEEP_DELETED_CELLS for all families, we 
will alter the properties for index table CFs as well.

 * *Case 1: Global Index Tables:*
 We can have _CQSI.separateAndValidateProperties_ return a _Map<table 
name/desc, Pair<orig table desc, new table desc>>_ and then later store all 
tabledescs and call _sendHBaseMetaData_() with this list of changes (which will 
now include GLOBAL index table changes as well).

 * *Case 2: Local Indexes:*
 Can we simply change the column descriptor for the local index CF for the data 
table? I'm not sure if this makes sense, but feel free to throw some light on 
this case.

 * *Question 2:*: If I create a local index on a column not belonging to the 
default CF like:
{code:sql}
CREATE LOCAL INDEX cf_specific_z_index ON z_base_table(host);
{code}
then shouldn't the local index be using a CF of "L#CF1" instead of the default 
"L#0"? In sqlline, when I do _select * from cf_specific_z_index;_, I see the 
column as _CF1:Host_, but when I _desc 'z_base_table'_ in HBase shell, I see 
the cf name to be "L#0".

 * *Question 3:* How do we handle the case of multiple local indexes created on 
the same table? If I run the following:
{code:sql}
CREATE LOCAL INDEX local_z_index1 ON z_base_table(host) 
TTL=9999,KEEP_DELETED_CELLS='true';
CREATE LOCAL INDEX local_z_index2 ON z_base_table(flag) 
TTL=8888,KEEP_DELETED_CELLS='false';
{code}
The actual HBase metadata change only reflects the last statement, since both 
local indexes map to the 'L#0' column family. Please let me know if this is 
handled at the Phoenix layer and I'm missing something.


was (Author: ckulkarni):
Hey [~jamestaylor], [~samarthjain] [~tdsilva]
Here are some points on achieving this along with some questions I have:
Let's take a simple example. Say I create the base data table with the 
following query:

{code:sql}
CREATE TABLE IF NOT EXISTS z_base_table (
id INTEGER not null primary key, CF1.host VARCHAR(10),flag BOOLEAN) 
TTL=120000,CF1.KEEP_DELETED_CELLS='true',REPLICATION_SCOPE='1';
{code}

We have the following paths to consider:

1. Create Index code path:
* *Case1: We create the data table with specific column families and there is 
no default CF*:
In this case, the global index table's default CF and the CFs corresponding to 
all local indexes should have default values for REPLICATION_SCOPE and 
KEEP_DELETED_CELLS as they do now, BUT they should inherit the TTL property 
from the non-local index CFs. In this case, it should be sufficient to check 
any non-local index CF's TTL since they are enforced to all be the same. 

* *Case2: The data table has a default CF*:
In this case, the global index table's default CF and the CFs corresponding to 
all local indexes should inherit REPLICATION_SCOPE, KEEP_DELETED_CELLS and the 
TTL property from the data table's default CF.

* *Question 1*: If we create an index with its own properties, say something 
like:
{code:sql}
CREATE INDEX diff_properties_z_index ON z_base_table(host) 
TTL=5000,KEEP_DELETED_CELLS='true';
{code}
We override the data table properties making the index tables and data table 
properties out of sync. This JIRA might set expectations that these properties 
are always in sync between index tables and the data table, so should we 
disallow this henceforth? At the very least we may want to log that the index 
table and data table properties will be out of sync after executing this query.

* *Question 1.1*: Given the above situation, if we later on alter the data 
table, should we blindly also alter the properties of the index tables (given 
that we want them to be in sync), or only alter index table properties in case 
they are equivalent to the data table properties?

* "Create index code path" changes should be achievable by changes in 
_CQSI.generateTableDescriptor_ before we apply specific properties of the index 
tables themselves.

2. Alter table set <TTL/REPLICATION_SCOPE/KEEP_DELETED_CELLS> code path:
* Here we can keep track of properties to be applied to 
_QueryConstants.ALL_FAMILY_PROPERTIES_KEY_ and not to specific CFs. In case we 
are changing TTL, REPLICATION_SCOPE or KEEP_DELETED_CELLS for all families, we 
will alter the properties for index table CFs as well.

* *Case 1: Global Index Tables:*
We can have _CQSI.separateAndValidateProperties_ return a _Map<table name/desc, 
Pair<orig table desc, new table desc>>_ and then later store all tabledescs and 
call _sendHBaseMetaData_() with this list of changes (which will now include 
GLOBAL index table changes as well). 

* *Case 2: Local Indexes:*
Can we simply change the column descriptor for the local index CF for the data 
table? I'm not sure if this makes sense, but feel free to throw some light on 
this case.

* *Question 2:*: If I create a local index on a CF specific column like:
{code:sql}
CREATE LOCAL INDEX cf_specific_z_index ON z_base_table(host);
{code}
 then shouldn't the local index be using a CF of "L#CF1" instead of the default 
"L#0"? In sqlline, when I do _select * from cf_specific_z_index;_, I see the 
column as _CF1:Host_, but when I _desc 'z_base_table'_ in HBase shell, I see 
the cf name to be "L#0". 

* *Question 3:* How do we handle the case of multiple local indexes created on 
the same table? If I run the following:
{code:sql}
CREATE LOCAL INDEX local_z_index1 ON z_base_table(host) 
TTL=9999,KEEP_DELETED_CELLS='true';
CREATE LOCAL INDEX local_z_index2 ON z_base_table(flag) 
TTL=8888,KEEP_DELETED_CELLS='false';
{code}
The actual HBase metadata change only reflects the last statement, since both 
local indexes map to the 'L#0' column family. Please let me know if this is 
handled at the Phoenix layer and I'm missing something.

> Ensure KEEP_DELETED_CELLS, REPLICATION_SCOPE, and TTL properties stay in sync 
> between the physical data table and index tables
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3955
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3955
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Chinmay Kulkarni
>            Priority: Major
>
> We need to make sure that indexes inherit the REPLICATION_SCOPE, 
> KEEP_DELETED_CELLS and TTL properties from the base table. Otherwise we can 
> run into situations where the data was removed (or not removed) from the data 
> table but was removed (or not removed) from the index. Or vice-versa. We also 
> need to make sure that any ALTER TABLE SET TTL or ALTER TABLE SET 
> KEEP_DELETED_CELLS statements propagate the properties to the indexes too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to