Hi all, I am trying to automate my workflow and build cubes for historic data, but I get the following behavior:
The workflow works fine until the loading of the HFile to Hbase: - When I create a small cube (1 week of data or less) it works fine - When I create a large cube (1 month or more) it fails after 35 attempts, see the full stack trace below - When I create a medium cube (2 weeks of data) sometimes it works, sometimes it fails Do you have any pointer on why we are seeing this issue? My first guess was that it is related to HBase config but I can't get it work and at the exact same time I see this issue the HBase cluster seems to be actually available: $ telnet xxx 60020 Trying xxx... Connected to xxx. Escape character is '^]'. I can also create tables / read / write without any problem to the HBase cluster. The full stack trace on the job failure: java.io.IOException: BulkLoad encountered an unrecoverable problem at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadI ncrementalHFiles.java:371) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncr ementalHFiles.java:295) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncremental HFiles.java:831) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.job.hadoop.hbase.BulkLoadJob.run(BulkLoadJob.java:83) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecuta ble.java:63) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutabl e.java:107) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChain edExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutabl e.java:107) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(Default Scheduler.java:132) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1 145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions: Tue Jun 16 09:20:43 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=70, waitTime=60004, rpcTimeout=60000 Tue Jun 16 09:21:44 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=75, waitTime=60010, rpcTimeout=60000 Tue Jun 16 09:22:44 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=78, waitTime=60014, rpcTimeout=60000 Tue Jun 16 09:23:45 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=81, waitTime=60052, rpcTimeout=60000 Tue Jun 16 09:24:47 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=86, waitTime=60009, rpcTimeout=60000 Tue Jun 16 09:26:09 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=89, waitTime=77658, rpcTimeout=60000 Tue Jun 16 09:27:19 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=95, waitTime=60054, rpcTimeout=60000 Tue Jun 16 09:28:29 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=99, waitTime=60017, rpcTimeout=60000 Tue Jun 16 09:29:44 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=103, waitTime=64938, rpcTimeout=60000 Tue Jun 16 09:30:54 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=107, waitTime=60055, rpcTimeout=60000 Tue Jun 16 09:32:14 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=109, waitTime=60011, rpcTimeout=60000 Tue Jun 16 09:33:34 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=113, waitTime=60020, rpcTimeout=60000 Tue Jun 16 09:34:54 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=117, waitTime=60036, rpcTimeout=60000 Tue Jun 16 09:36:15 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=119, waitTime=60039, rpcTimeout=60000 Tue Jun 16 09:37:43 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=123, waitTime=68398, rpcTimeout=60000 Tue Jun 16 09:39:25 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=127, waitTime=82129, rpcTimeout=60000 Tue Jun 16 09:40:45 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=133, waitTime=60034, rpcTimeout=60000 Tue Jun 16 09:42:06 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=137, waitTime=60025, rpcTimeout=60000 Tue Jun 16 09:43:26 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=141, waitTime=60014, rpcTimeout=60000 Tue Jun 16 09:44:46 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=145, waitTime=60037, rpcTimeout=60000 Tue Jun 16 09:46:52 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=147, waitTime=105914, rpcTimeout=60000 Tue Jun 16 09:48:42 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=153, waitTime=90331, rpcTimeout=60000 Tue Jun 16 09:50:03 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=155, waitTime=60023, rpcTimeout=60000 Tue Jun 16 09:51:23 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=159, waitTime=60046, rpcTimeout=60000 Tue Jun 16 09:53:13 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=163, waitTime=90254, rpcTimeout=60000 Tue Jun 16 09:54:33 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=169, waitTime=60007, rpcTimeout=60000 Tue Jun 16 09:55:54 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=171, waitTime=60017, rpcTimeout=60000 Tue Jun 16 09:57:32 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=176, waitTime=78087, rpcTimeout=60000 Tue Jun 16 09:58:52 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=181, waitTime=60030, rpcTimeout=60000 Tue Jun 16 10:00:12 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=183, waitTime=60014, rpcTimeout=60000 Tue Jun 16 10:01:32 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=187, waitTime=60006, rpcTimeout=60000 Tue Jun 16 10:03:02 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=191, waitTime=69658, rpcTimeout=60000 Tue Jun 16 10:04:22 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=195, waitTime=60030, rpcTimeout=60000 Tue Jun 16 10:05:42 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=199, waitTime=60050, rpcTimeout=60000 Tue Jun 16 10:07:35 PDT 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@72ed4b94, java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=203, waitTime=92703, rpcTimeout=60000 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryin gCaller.java:136) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryin gCaller.java:97) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad (LoadIncrementalHFiles.java:612) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncremen talHFiles.java:350) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncremen talHFiles.java:348) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.io.IOException: Call to xxx failed on local exception: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=203, waitTime=92703, rpcTimeout=60000 at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1532) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1502) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:168 4) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.call BlockingMethod(RpcClient.java:1737) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$Block ingStub.bulkLoadHFile(ClientProtos.java:29276) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil.ja va:1548) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncremen talHFiles.java:572) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncremen talHFiles.java:561) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryin gCaller.java:121) ... 8 more Caused by: org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=203, waitTime=92703, rpcTimeout=60000 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.cleanupCalls(RpcClient.jav a:1234) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.jav a:1171) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:751) result code:2 In the Kylin logs I also see this kind of logs: 2015-06-22 01:49:54,176 INFO [LoadIncrementalHFiles-1] mapreduce.LoadIncrementalHFiles: HFile at hdfs://xxx:8020/tmp/kylin-b68bac23-ea82-4471-bc70 -c144991fbbe0/smallCube/hfile/F1/e6ed300e4d9c41938d1ee474536c4fbf no longer fits inside a single region. Splitting... 2015-06-22 01:49:54,182 INFO [LoadIncrementalHFiles-3] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://xxx:8020/tmp/kylin-b68bac23-ea82-4471-bc70 -c144991fbbe0/smallCube/hfile/F1/_tmp/KYLIN_N8QFNVVCSY,2.bottom and hdfs://xxx:8020/tmp/kylin-b68bac23-ea82-4471-bc70 -c144991fbbe0/smallCube/hfile/F1/_tmp/KYLIN_N8QFNVVCSY,2.top 2015-06-22 01:49:54,425 INFO [LoadIncrementalHFiles-3] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://xxx:8020/tmp/kylin-b68bac23-ea82-447 1-bc70-c144991fbbe0/smallCube/hfile/F1/f002ed0786894ddfb83f4a6311fca985 first=\x00\x00\x00\x00\x00\x00\x00\x07\x90\xA8\x03\xDB\xDBC last=\x00\x00\x00\x00\x00\x00\x00\x07\xBC\xF3\x03`F+
