[jira] [Commented] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Ankit Singhal (JIRA) Wed, 20 Apr 2016 00:29:12 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249413#comment-15249413
 ]


Ankit Singhal commented on PHOENIX-2724:
----------------------------------------

[~samarthjain],

It seems we are flattening nested scans every time but this will impact the 
performance of "non-order by" queries on salted and local index table . Where 
we have to run the queries in parallel per region for performance if 
phoenix.query.force.rowkeyorder is set to true or "Order by" on rowkey. 
Other way, If we want to keep serialIterators as completely serial then we 
should change isSerial to exclude salted and localIndex tables.

{code}
+        final List<Scan> flattenedScans = 
Lists.newArrayListWithExpectedSize(expectedListSize);
+        for (List<Scan> list : nestedScans) {
+            flattenedScans.addAll(list);
+        }
{code}

Is there any reason why we changed this?
{code}
-            }, "Serial scanner for table: " + 
tableRef.getTable().getPhysicalName().getString()));
+            }, "Serial scanner for table: " + 
tableRef.getTable().getName().getString()));
{code}

if you are renaming a method, can you please rename the same in 
QueryUtil.getUnusedOffset too
{code}
-    public Integer getUnusedOffset() {
+    public Integer getRemainingOffset() {
         return (offset - rowCount) > 0 ? (offset - rowCount) : 0;
     }
{code}


Is this just formatting change?
{code}
-    private void initTableValues(Connection conn) throws SQLException {
-        for (int i = 0; i < 26; i++) {
-            conn.createStatement().execute("UPSERT INTO " + tableName + " 
values('" + strings[i] + "'," + i + ","
-                    + (i + 1) + "," + (i + 2) + ",'" + strings[25 - i] + "')");
-        }
-        conn.commit();
-    }
-
-    private void updateStatistics(Connection conn) throws SQLException {
-        String query = "UPDATE STATISTICS " + tableName + " SET \"" + 
QueryServices.STATS_GUIDEPOST_WIDTH_BYTES_ATTRIB
-                + "\"=" + Long.toString(500);
-        conn.createStatement().execute(query);
-    }
-
     @Test
     public void testMetaDataWithOffset() throws SQLException {
         Connection conn;
@@ -207,5 +194,19 @@ public class QueryWithOffsetIT extends 
BaseOwnClusterHBaseManagedTimeIT {
         ResultSetMetaData md = rs.getMetaData();
         assertEquals(5, md.getColumnCount());
     }
+    
+    private void initTableValues(Connection conn) throws SQLException {
+        for (int i = 0; i < 26; i++) {
+            conn.createStatement().execute("UPSERT INTO " + tableName + " 
values('" + strings[i] + "'," + i + ","
+                    + (i + 1) + "," + (i + 2) + ",'" + strings[25 - i] + "')");
+        }
+        conn.commit();
+    }
+
+    private void updateStatistics(Connection conn) throws SQLException {
+        String query = "UPDATE STATISTICS " + tableName + " SET \"" + 
QueryServices.STATS_GUIDEPOST_WIDTH_BYTES_ATTRIB
+                + "\"=" + Long.toString(500);
+        conn.createStatement().execute(query);
+    }
{code}

> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan 
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Reply via email to