[ https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237063#comment-14237063 ]
James Taylor commented on PHOENIX-1498: --------------------------------------- Thanks for the patch, [~jeffreyz]. Couple of comments/questions: - We need to set KEEP_DELETED_CELLS=true for SYSTEM.CATALOG, SYSTEM.SEQUENCE, and SYSTEM.STATS - just add it to the DDL statements in QueryConstants. Otherwise, tables that want to use KEEP_DELETED_CELLS will no longer maintain the proper versions for schema changes. - Tell me more about the test changes, in particular why the system tables need to be deleted after each test suite? If you do the above, is that no longer necessary? {code} +++ phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseClientManagedTimeIT.java @@ -74,6 +74,6 @@ public abstract class BaseClientManagedTimeIT extends BaseTest { @AfterClass public static void doTeardown() throws Exception { - dropNonSystemTables(); + dropAllTables(); } } {code} - No need to initialize props using getDefaultProps(), as the overrides are combined with the defaults. Also, any reason why these are all added here? I suppose they were getting set on the super class before? {code} public class QueryIT extends BaseQueryIT { + @BeforeClass + @Shadower(classBeingShadowed = BaseQueryIT.class) + public static void doSetup() throws Exception { + Map<String,String> props = getDefaultProps(); + props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000)); + props.put(IndexWriterUtils.HTABLE_THREAD_KEY, Integer.toString(100)); + // Make a small batch size to test multiple calls to reserve sequences + props.put(QueryServices.SEQUENCE_CACHE_SIZE_ATTRIB, Long.toString(BATCH_SIZE)); + props.put(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, "true"); + // Must update config before starting server + setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator())); + } + {code} - The code here hardcodes that by default KEEP_DELETED_CELLS is not set. IMO, it'd be a little cleaner to declare in QueryServicesOptions a public static final boolean DEFAULT_KEEP_DELETED_CELLS = false and then just set the value like this: columnDesc.setKeepDeletedCells(props.getBoolean(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, QueryServicesOptions.DEFAULT_KEEP_DELETED_CELLS)); {code} diff --git phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java private HColumnDescriptor generateColumnFamilyDescriptor(Pair<byte[],Map<String,Object>> family, PTableType tableType) throws SQLException { HColumnDescriptor columnDesc = new HColumnDescriptor(family.getFirst()); if (tableType != PTableType.VIEW) { - columnDesc.setKeepDeletedCells(true); + if(props.get(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB) != null){ + columnDesc.setKeepDeletedCells(props.getBoolean( + QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, false)); + } {code} - Is it necessary to set the queue size depth here? {code} public class GroupByIT extends BaseQueryIT { + @BeforeClass + @Shadower(classBeingShadowed = BaseQueryIT.class) + public static void doSetup() throws Exception { + Map<String,String> props = getDefaultProps(); + props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000)); {code} > Turn KEEP_DELETED_CELLS off by default > -------------------------------------- > > Key: PHOENIX-1498 > URL: https://issues.apache.org/jira/browse/PHOENIX-1498 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.0.0, 5.0.0 > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Attachments: PHOENIX-1498-v2.patch, PHOENIX-1498.patch > > > Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this > is only used to allow for flashback queries to work correctly. While > flashback query isn't used often in field and we found that query performance > degraded with the option on. This is likely a hbase scan issue though(will > create a JIRA once having more info). > Anyway Keeping deleted cells will add performance penalty and it's not used > often. Therefore, I'm suggesting to set it off by default. > We have a test where a table is loaded with > 5m rows and then some are > deleted/reinserted. The count ( * ) performance became worse & worse: > {code} > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (33.273 seconds) > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (174.771 seconds) > +------------+ > | COUNT(1) | > +------------+ > | 5078242 | > +------------+ > 1 row selected (458.251 seconds) > {code} > I think we can provide a table property in CREATE TABLE & ALTER TABLE > statement for people to enable KEEP_DELETED_CELLS if there is a need but by > default it should be turned off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)