[ https://issues.apache.org/jira/browse/PHOENIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Taylor updated PHOENIX-654: --------------------------------- Description: When you create a TABLE, we insert an empty key value into the first column family that we can count on being there for every row. For a VIEW, we don't do that, so we just fall back on projecting everything into a scan. If there are lots of columns (for example, 60,000 in [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) case), the scan is very slow. Instead, we should only project everything when absolutely necessary, in these cases: * IS NULL expression * CASE WHEN with an ELSE expression * Usages of row value constructor * When a column in the primary key is used * When there is no where clause * When there is a group by of a nullable expression We could potentially do the same for a TABLE, but the empty key value seems like a better trade off as far as performance goes. In addition, we need the empty key value as a row cannot exist without at least one key value, making it impossible to support use cases that only define a primary key. was: When you create a TABLE, we insert an empty key value into the first column family that we can count on being there for every row. For a VIEW, we don't do that, so we just fall back on projecting everything into a scan. If there are lots of columns (for example, 60,000 in [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) case), the scan is very slow. Instead, we should only project everything when absolutely necessary, in these cases: * When the EvaluateOnCompletionVisitor is run over the where clause expression returns true for visitor.evaluateOnCompletion(). This captures cases such as: * IS NULL check * CASE WHEN ELSE * Usages of row value constructor * When there is no where clause * When there is a group by of a nullable expression We could potentially do the same for a TABLE, but the empty key value seems like a better trade off as far as performance goes. In addition, we need the empty key value as a row cannot exist without at least one key value, making it impossible to support use cases that only define a primary key. > Minimize projection into scan for VIEW > -------------------------------------- > > Key: PHOENIX-654 > URL: https://issues.apache.org/jira/browse/PHOENIX-654 > Project: Phoenix > Issue Type: Task > Reporter: James Taylor > > When you create a TABLE, we insert an empty key value into the first column > family that we can count on being there for every row. For a VIEW, we don't > do that, so we just fall back on projecting everything into a scan. If there > are lots of columns (for example, 60,000 in > [this](https://groups.google.com/forum/_!topic/phoenix-hbase-user/JgQjlqC4-uw) > case), the scan is very slow. > Instead, we should only project everything when absolutely necessary, in > these cases: > * IS NULL expression > * CASE WHEN with an ELSE expression > * Usages of row value constructor > * When a column in the primary key is used > * When there is no where clause > * When there is a group by of a nullable expression > We could potentially do the same for a TABLE, but the empty key value seems > like a better trade off as far as performance goes. In addition, we need the > empty key value as a row cannot exist without at least one key value, making > it impossible to support use cases that only define a primary key. -- This message was sent by Atlassian JIRA (v6.2#6252)