[jira] [Updated] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call

Vikas Vishwakarma (JIRA) Wed, 27 Aug 2014 22:05:31 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vikas Vishwakarma updated PHOENIX-1215:
---------------------------------------

    Description: 
Test Setup:
==========
4 load client with 16 threads each running against a HBase cluster with 6 
RegionServers. 
Each load client spawns a thread and runs a upsert query in batch mode

Pseudo code:
==========
SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES 
(?, ?)

    for (int i=0; i<6400; i++) {

      upsertSql.append(SQL_UPSERT_HBASE_DATA);
      ps = conn.prepareStatement(preparedSqlTableStmt.toString());
      ps.setString(1, rowKey);
      ps.setString(2, rowVal);
      ps.execute();

    }
    conn.commit();
    tryClose(conn);


Observation:
==============
Observing some issues where a small number of transactions (approx 40 out of 
80,000) are hanging indefinitely on preparedStatement.execute() call. 

I tried to put a timeout on the transaction but looks like phoenix still does 
not support preparedStatement.setQueryTimeout().

Log Analysis:
=============
On the client side there are no errors or exceptions except some threads in 
hung state 

On the service side, the observation is as follows:
Also after some log analysis,  it looks like all the batches that got stuck, 
are during the start of the test. From the RegionServer logs, the phoenix 
co-processor loading and index updates is in progress during this time (which 
involves some failures and retries). Once this phase is over, remaining write 
batches that started later have succeeded.

RegionServer log snapshots during the time period this issue was observed on 
the client side:
---------
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
....
2014-08-27 03:28:06,507 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and 
priority 1
2014-08-27 03:28:06,507 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 
successfully.
....
2014-08-27 03:28:06,508 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null 
and priority 1
2014-08-27 03:28:06,508 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 
successfully.
...
2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
..
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished 
scanning 0 rows for ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}

  was:
Test Setup:
==========
4 load client with 16 threads each running against a HBase cluster with 6 
RegionServers. 
Each load client spawns a thread and runs a upsert query in batch mode

Pseudo code:
==========
SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES 
(?, ?)

    for (int i=0; i<6400; i++) {
      upsertSql.append(SQL_UPSERT_HBASE_DATA);
      ps = conn.prepareStatement(preparedSqlTableStmt.toString());
      ps.setString(1, rowKey);
      ps.setString(2, rowVal);
      ps.execute();
    }
    conn.commit();
    tryClose(conn);

Observation:
==============
Observing some issues where a small number of transactions (approx 40 out of 
80,000) are hanging indefinitely on preparedStatement.execute() call. 

I tried to put a timeout on the transaction but looks like phoenix still does 
not support preparedStatement.setQueryTimeout().

Log Analysis:
=============
On the client side there are no errors or exceptions except some threads in 
hung state 

On the service side, the observation is as follows:
Also after some log analysis,  it looks like all the batches that got stuck, 
are during the start of the test. From the RegionServer logs, the phoenix 
co-processor loading and index updates is in progress during this time (which 
involves some failures and retries). Once this phase is over, remaining write 
batches that started later have succeeded.

RegionServer log snapshots during the time period this issue was observed on 
the client side:
---------
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
....
2014-08-27 03:28:06,507 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and 
priority 1
2014-08-27 03:28:06,507 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 
successfully.
....
2014-08-27 03:28:06,508 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null 
and priority 1
2014-08-27 03:28:06,508 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 
successfully.
...
2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
..
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished 
scanning 0 rows for ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}


> some phoenix transactions are hanging indefinitely on ps.execute() call
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-1215
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1215
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 
> 1.90GHz
> MemTotal:       65801264 kB, 1 Gbps Network
> Red Hat Enterprise Linux Server release 6.2
>            Reporter: Vikas Vishwakarma
>
> Test Setup:
> ==========
> 4 load client with 16 threads each running against a HBase cluster with 6 
> RegionServers. 
> Each load client spawns a thread and runs a upsert query in batch mode
> Pseudo code:
> ==========
> SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES 
> (?, ?)
>     for (int i=0; i<6400; i++) {
>       upsertSql.append(SQL_UPSERT_HBASE_DATA);
>       ps = conn.prepareStatement(preparedSqlTableStmt.toString());
>       ps.setString(1, rowKey);
>       ps.setString(2, rowVal);
>       ps.execute();
>     }
>     conn.commit();
>     tryClose(conn);
> Observation:
> ==============
> Observing some issues where a small number of transactions (approx 40 out of 
> 80,000) are hanging indefinitely on preparedStatement.execute() call. 
> I tried to put a timeout on the transaction but looks like phoenix still does 
> not support preparedStatement.setQueryTimeout().
> Log Analysis:
> =============
> On the client side there are no errors or exceptions except some threads in 
> hung state 
> On the service side, the observation is as follows:
> Also after some log analysis,  it looks like all the batches that got stuck, 
> are during the start of the test. From the RegionServer logs, the phoenix 
> co-processor loading and index updates is in progress during this time (which 
> involves some failures and retries). Once this phase is over, remaining write 
> batches that started later have succeeded.
> RegionServer log snapshots during the time period this issue was observed on 
> the client side:
> ---------
> 2014-08-27 03:28:00,114 INFO 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded 
> coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of 
> HT_42 successfully.
> 2014-08-27 03:28:00,114 INFO 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded 
> coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of 
> HT_42 successfully.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found 
> some outstanding index updates that didn't succeed during WAL replay - 
> attempting to replay now.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found 
> some outstanding index updates that didn't succeed during WAL replay - 
> attempting to replay now.
> 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found 
> some outstanding index updates that didn't succeed during WAL replay - 
> attempting to replay now.
> 2014-08-27 03:28:00,931 INFO 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
> ungrouped coprocessor scan 
> {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> 2014-08-27 03:28:00,931 INFO 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
> ungrouped coprocessor scan 
> {"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> ....
> 2014-08-27 03:28:06,507 DEBUG 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor 
> class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null 
> and priority 1
> 2014-08-27 03:28:06,507 INFO 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded 
> coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD 
> of HT_44 successfully.
> ....
> 2014-08-27 03:28:06,508 DEBUG 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor 
> class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path 
> null and priority 1
> 2014-08-27 03:28:06,508 INFO 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded 
> coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver 
> from HTD of HT_44 successfully.
> ...
> 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found 
> some outstanding index updates that didn't succeed during WAL replay - 
> attempting to replay now.
> 2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found 
> some outstanding index updates that didn't succeed during WAL replay - 
> attempting to replay now.
> ..
> 2014-08-27 03:28:20,660 INFO 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
> ungrouped coprocessor scan 
> {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
> 2014-08-27 03:28:20,660 INFO 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished 
> scanning 0 rows for ungrouped coprocessor scan 
> {"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call

Reply via email to