[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Hadoop QA (JIRA) Tue, 20 Dec 2016 03:07:35 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763940#comment-15763940
 ]


Hadoop QA commented on PHOENIX-3271:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12844043/PHOENIX-3271.patch
  against master branch at commit e45b5a706107e31bc6e5b8289725db097b2820eb.
  ATTACHMENT ID: 12844043

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
44 warning messages.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +                        runOnServer = isAutoCommit && 
!table.isTransactional() && !(table.isImmutableRows() && 
!table.getIndexes().isEmpty()) && table.getRowTimestampColPos() == -1;
+                    // If the row ends up living in a different region, we'll 
get an error otherwise.
+                                .equals(new ColumnRef(tableRef, 
column.getPosition()).newColumnExpression())) {
+                            // TODO: we could check the region boundaries to 
see if the pk will still be in it.
+                            runOnServer = false; // bail on running server 
side, since PK may be changing
+                    
scan.setAttribute(BaseScannerRegionObserver.UPSERT_SELECT_TARGET_TABLE, 
tableRef.getTable().getPhysicalName().getBytes());
+                    final QueryPlan aggPlan = new AggregatePlan(context, 
select, statementContext.getCurrentTable(), aggProjector, null,null, 
OrderBy.EMPTY_ORDER_BY, null, GroupBy.EMPTY_GROUP_BY, null);
+    private void commitBatchWithHTable(HTable table, Region region, 
List<Mutation> mutations, byte[] indexUUID,
+            long blockingMemstoreSize, byte[] indexMaintainersPtr, byte[] 
txState) throws IOException {
+            //Need to add indexMaintainers for each mutation as table.batch 
can be distributed across servers

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.phoenix.compile.QueryOptimizerTest
                  org.apache.phoenix.util.PhoenixRuntimeTest
                  org.apache.phoenix.index.IndexMaintainerTest

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/705//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/705//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/705//console

This message is automatically generated.

> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local 
> index is orders of magnitude faster that creation of global indexes (17 
> seconds versus 10-20 minutes - though more data is written in the global 
> index case). Under the covers, a global index is created through the running 
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a 
> table. In both of these cases, the data being upserted must all flow back to 
> the same client which can become a bottleneck for a large table. Instead, 
> what can be done is to push each separate, chunked UPSERT SELECT call out to 
> a different region server for execution there. One way we could implement 
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT 
> out to each region server and return the number of rows that were upserted 
> back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Reply via email to