Vikas Vishwakarma created PHOENIX-1215:
------------------------------------------

             Summary: some transactions are hanging indefinitely on 
                 Key: PHOENIX-1215
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1215
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.0.0
         Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 
1.90GHz
MemTotal:       65801264 kB, 1 Gbps Network
Red Hat Enterprise Linux Server release 6.2

            Reporter: Vikas Vishwakarma


Test Setup:
==========
4 load client with 16 threads each running against a HBase cluster with 6 
RegionServers. 
Each load client spawns a thread and runs a upsert query in batch mode

Pseudo code:
==========
SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME  (ROW_KEY_ID, ROW_VAL) VALUES 
(?, ?)

    for (int i=0; i<6400; i++) {
      upsertSql.append(SQL_UPSERT_HBASE_DATA);
      ps = conn.prepareStatement(preparedSqlTableStmt.toString());
      ps.setString(1, rowKey);
      ps.setString(2, rowVal);
      ps.execute();
    }
    conn.commit();
    tryClose(conn);

Observation:
==============
Observing some issues where a small number of transactions (approx 40 out of 
80,000) are hanging indefinitely on preparedStatement.execute() call. 

I tried to put a timeout on the transaction but looks like phoenix still does 
not support preparedStatement.setQueryTimeout().

Log Analysis:
=============
On the client side there are no errors or exceptions except some threads in 
hung state 

On the service side, the observation is as follows:
Also after some log analysis,  it looks like all the batches that got stuck, 
are during the start of the test. From the RegionServer logs, the phoenix 
co-processor loading and index updates is in progress during this time (which 
involves some failures and retries). Once this phase is over, remaining write 
batches that started later have succeeded.

RegionServer log snapshots during the time period this issue was observed on 
the client side:
---------
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,114 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 
successfully.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x0E","stopRow":"\\x0F","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:00,931 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110080691],"batch":-1,"startRow":"\\x07","stopRow":"\\x08","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
....
2014-08-27 03:28:06,507 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and 
priority 1
2014-08-27 03:28:06,507 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 
successfully.
....
2014-08-27 03:28:06,508 DEBUG 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null 
and priority 1
2014-08-27 03:28:06,508 INFO 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 
successfully.
...
2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some 
outstanding index updates that didn't succeed during WAL replay - attempting to 
replay now.
..
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting 
ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}
2014-08-27 03:28:20,660 INFO 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished 
scanning 0 rows for ungrouped coprocessor scan 
{"timeRange":[0,1409110100554],"batch":-1,"startRow":"\\x0C","stopRow":"\\x0D","loadColumnFamiliesOnDemand":true,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":-1,"maxVersions":1,"caching":-1}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to