[ 
https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237063#comment-14237063
 ] 

James Taylor commented on PHOENIX-1498:
---------------------------------------

Thanks for the patch, [~jeffreyz]. Couple of comments/questions:
- We need to set KEEP_DELETED_CELLS=true for SYSTEM.CATALOG, SYSTEM.SEQUENCE, 
and SYSTEM.STATS - just add it to the DDL statements in QueryConstants. 
Otherwise, tables that want to use KEEP_DELETED_CELLS will no longer maintain 
the proper versions for schema changes.
- Tell me more about the test changes, in particular why the system tables need 
to be deleted after each test suite? If you do the above, is that no longer 
necessary?
{code}
+++ 
phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseClientManagedTimeIT.java
@@ -74,6 +74,6 @@ public abstract class BaseClientManagedTimeIT extends 
BaseTest {
     
     @AfterClass
     public static void doTeardown() throws Exception {
-        dropNonSystemTables();
+        dropAllTables();
     }
 }
{code}
- No need to initialize props using getDefaultProps(), as the overrides are 
combined with the defaults. Also, any reason why these are all added here? I 
suppose they were getting set on the super class before?
{code}
 public class QueryIT extends BaseQueryIT {
     
+    @BeforeClass
+    @Shadower(classBeingShadowed = BaseQueryIT.class)
+    public static void doSetup() throws Exception {
+        Map<String,String> props = getDefaultProps();
+        props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000));
+        props.put(IndexWriterUtils.HTABLE_THREAD_KEY, Integer.toString(100));
+        // Make a small batch size to test multiple calls to reserve sequences
+        props.put(QueryServices.SEQUENCE_CACHE_SIZE_ATTRIB, 
Long.toString(BATCH_SIZE));
+        props.put(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, "true");
+        // Must update config before starting server
+        setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
+    }
+
{code}
- The code here hardcodes that by default KEEP_DELETED_CELLS is not set. IMO, 
it'd be a little cleaner to declare in QueryServicesOptions a public static 
final boolean DEFAULT_KEEP_DELETED_CELLS = false and then just set the value 
like this: 
columnDesc.setKeepDeletedCells(props.getBoolean(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB,
 QueryServicesOptions.DEFAULT_KEEP_DELETED_CELLS));
{code}
diff --git 
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
 
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
     private HColumnDescriptor 
generateColumnFamilyDescriptor(Pair<byte[],Map<String,Object>> family, 
PTableType tableType) throws SQLException {
         HColumnDescriptor columnDesc = new 
HColumnDescriptor(family.getFirst());
         if (tableType != PTableType.VIEW) {
-            columnDesc.setKeepDeletedCells(true);
+            if(props.get(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB) != 
null){
+                columnDesc.setKeepDeletedCells(props.getBoolean(
+                    QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, false));
+            }
{code}
- Is it necessary to set the queue size depth here?
{code}
public class GroupByIT extends BaseQueryIT {
 
+    @BeforeClass
+    @Shadower(classBeingShadowed = BaseQueryIT.class)
+    public static void doSetup() throws Exception {
+        Map<String,String> props = getDefaultProps();
+        props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000));
{code}

> Turn KEEP_DELETED_CELLS off by default
> --------------------------------------
>
>                 Key: PHOENIX-1498
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1498
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 5.0.0
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-1498-v2.patch, PHOENIX-1498.patch
>
>
> Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this 
> is only used to allow for flashback queries to work correctly. While 
> flashback query isn't used often in field and we found that query performance 
> degraded with the option on. This is likely a hbase scan issue though(will 
> create a JIRA once having more info). 
> Anyway Keeping deleted cells will add performance penalty and it's not used 
> often. Therefore, I'm suggesting to set it off by default. 
> We have a test where a table is loaded with > 5m rows and then some are 
> deleted/reinserted. The count ( * ) performance became worse & worse:
> {code}
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (33.273 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (174.771 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (458.251 seconds)
> {code}
> I think we can provide a table property in CREATE TABLE & ALTER TABLE 
> statement for people to enable KEEP_DELETED_CELLS if there is a need but by 
> default it should be turned off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to