[jira] [Commented] (PHOENIX-2607) PhoenixMapReduceUtil Upserts with earlier ts (relative to latest data ts) slower by 25x after stats collection

Hadoop QA (JIRA) Sat, 06 Feb 2016 03:27:07 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135733#comment-15135733
 ]


Hadoop QA commented on PHOENIX-2607:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12786655/PHOENIX-2607.patch
  against master branch at commit fa58fc5fccd11cbeb14e701a59122b3cba77d67e.
  ATTACHMENT ID: 12786655

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
                        or dev patch that doesn't require tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
20 warning messages.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +        return PTableImpl.makePTable(tenantId, schemaName, tableName, 
tableType, indexState, timeStamp, createdTimestamp,
+                region.mutateRowsWithLocks(tableMetadata, 
Collections.<byte[]>emptySet(), HConstants.NO_NONCE, HConstants.NO_NONCE);
+    private boolean tableTimestampCheck(long clientTimeStamp, PTable table, 
RegionCoprocessorEnvironment env) {
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), timeStamp, 
table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), 
table.getTimeStamp(), table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), timeStamp, 
table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), timeStamp, 
table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), timeStamp, 
table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), state, table.getTimeStamp(), 
table.getCreatedTimestamp(),
+                table.getTenantId(), table.getSchemaName(), 
table.getTableName(), table.getType(), table.getIndexState(), 
table.getTimeStamp(), table.getCreatedTimestamp(),

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

     {color:red}-1 core zombie tests{color}.  There are 3 zombie test(s):       
at 
org.apache.maven.surefire.booter.ForkingRunListener.testSucceeded(ForkingRunListener.java:127)
        at 
org.apache.maven.surefire.common.junit4.JUnit4RunListener.testFinished(JUnit4RunListener.java:143)
        at 
org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:56)
        at 
org.apache.phoenix.jdbc.PhoenixResultSetMetadataTest.testColumnDisplaySize(PhoenixResultSetMetadataTest.java:38)
        at 
org.apache.phoenix.util.PhoenixRuntimeTest.testGetTenantIdExpression(PhoenixRuntimeTest.java:210)

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/245//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/245//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/245//console

This message is automatically generated.

> PhoenixMapReduceUtil Upserts with earlier ts (relative to latest data ts) 
> slower by 25x after stats collection
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2607
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2607
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.4.0
>            Reporter: Arun Thangamani
>              Labels: patch
>             Fix For: 4.4.0
>
>         Attachments: PHOENIX-2607.patch, hbase-master-fast-upload.log, 
> hbase-master-slow-upload.log, hbase-rs01-fast-upload.log, 
> hbase-rs01-slow-upload.log, hbase-rs02-fast-upload.log, 
> hbase-rs02-slow-upload.log, hbase-rs03-fast-upload.log, 
> hbase-rs03-slow-upload.log, hbase-rs04-fast-upload.log, 
> hbase-rs04-slow-upload.log, phoenix_slow_map_process_jstack.txt, 
> region_server_2_jstack.txt
>
>
> Description of the problem:
> 1) We face a 25x slow down when go back in time to load data in a table (when 
> specific timestamps set on connections during upserts)
> 2) set phoenix.stats.useCurrentTime=false (and 
> phoenix.stats.guidepost.per.region 1) which at least makes the forward 
> timestamps upserts perform correctly 
> 3) From what I can tell from the phoenix source code, logs attached and 
> jstacks from the region servers -- we continuously try to lookup the uncached 
> definition of the table when we have client timestamp earlier than the last 
> modified timestamp of the table in stats 
> 4) To reproduce, create a table with timestamp=100, and load 10M rows with 
> PhoenixMapReduceUtil and timestamps=1447574400000,1448092800000, wait for 20 
> mins (15+ min, phoenix.stats.updateFrequency is 15mins)
> After 20 mins, load 10M rows with a earlier timestamp compared to the latest 
> data (timestamp=1447660800000) and observe the 25x slowness, after this once 
> again load a forward timestamp 1448179200000 and observe the quickness
> 5) I was not able to reproduce this issue with simple multi threaded upserts 
> from a jdbc connection, with simple multi threaded upserts the stats table 
> never gets populated unlike PhoenixMapReduceUtil
> We are trying to use phoenix as a cache store to do analytics with the last 
> 60 days of data, a total of about 1.5 billion rows
> The table has a composite key and the data arrives in different times from 
> different sources, so it is easier to maintain the timestamps of the data and 
> expire the data automatically, this performance makes a difference between 
> inserting the data in 10 mins versus 2 hours, 2 hours for data inserts 
> blocking up the cluster that we have.
> We are even talking about our use cases in the upcoming strata conference in 
> March..  (Thanks to excellent community)
> Steps to reproduce:
> Source code is available in 
> (https://github.com/athangamani/phoenix_mapreduce_timestamp_upsert) and the 
> jar the source code produces is attached which is readily runnable 
> 1) We use the following params to keep the stats collection happy to isolate 
> the specific issue
>      phoenix.stats.useCurrentTime false
>      phoenix.stats.guidepost.per.region 1
> 2) Create a table in phoenix 
>    Run the following main class from the project.. 
> (StatPhoenixTableCreationTest).. It will create a table with timestamp=100  
>               CREATE TABLE stat_table ( 
>                       pk1 VARCHAR NOT NULL, 
>                       pk2 VARCHAR NOT NULL, 
>                       pk3 UNSIGNED_LONG NOT NULL, 
>                        stat1 UNSIGNED_LONG, 
>                        stat2 UNSIGNED_LONG, 
>                        stat3 UNSIGNED_LONG, 
>                        CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3) 
>               ) SALT_BUCKETS=32, COMPRESSION='LZ4'
> 3) Open the code base to look at the sample for PhoenixMapReduceUtil.. With 
> DBWritable..  
> 4) Within the codebase, we get phoenix connection for the mappers using the 
> following settings in order to have a fixed client timestamp
>      conf.set(PhoenixRuntime.CURRENT_SCN_ATTRIB, ""+(timestamp));
> 5) fix the hbase-site.xml in the codebase for zookeeper quorum and hbase 
> parent znode info     
> 6) simply run the StatDataCreatorTest to create data for the run and load it 
> in hdfs for 10M records
> 7) to run the ready made jar attached, use the following commands, 
>    hadoop jar phoenix_mr_ts_upsert-jar-with-dependencies.jar 
> statPhoenixLoader hdfs:///user/*****/stat-data-1.txt STAT_TABLE 1447574400000
>    hadoop jar phoenix_mr_ts_upsert-jar-with-dependencies.jar 
> statPhoenixLoader hdfs:///user/*****/stat-data-1.txt STAT_TABLE 1448092800000 
>     After 20 mins… 
>    hadoop jar phoenix_mr_ts_upsert-jar-with-dependencies.jar 
> statPhoenixLoader hdfs:///user/*****/stat-data-1.txt STAT_TABLE 1447660800000
>    hadoop jar phoenix_mr_ts_upsert-jar-with-dependencies.jar 
> statPhoenixLoader hdfs:///user/*****/stat-data-1.txt STAT_TABLE 1449000000000
> 8) observe the 25x slowness in the 3rd run and observe the normal pace in the 
> 4th run
> 9) Attached are the region server logs for the fast process in forward 
> progression versus the slow progress in reverse timestamp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2607) PhoenixMapReduceUtil Upserts with earlier ts (relative to latest data ts) slower by 25x after stats collection

Reply via email to