[ 
https://issues.apache.org/jira/browse/TAJO-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated TAJO-257:
------------------------------

    Attachment: TAJO-257.patch

This issue is caused by several problems as follows:
 - Before workers are stopped completely, TajoTestingClient tried to shutdown 
HDFS.
 - Jetty server in each worker was hanging occasionally.
 - Some query execution query didn't close query results. So, it caused 
resource leak.

This patch solves the above problems. I've tested unit tests ten times 
repeatedly. All unit tests are passed without any fail and hanging.

> Unit tests occassionally fail.
> ------------------------------
>
>                 Key: TAJO-257
>                 URL: https://issues.apache.org/jira/browse/TAJO-257
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyunsik Choi
>            Priority: Blocker
>             Fix For: 0.2-incubating
>
>         Attachments: TAJO-257.patch
>
>
> {noformat}
> 2013-10-16 14:55:43,110 INFO  tajo.TajoTestingCluster 
> (TajoTestingCluster.java:startMiniTajoCluster(292)) - Mini Tajo cluster is up
> 2013-10-16 14:55:43,110 INFO  rm.TajoWorkerResourceManager 
> (TajoWorkerResourceManager.java:workerHeartbeat(395)) - 
> TajoWorker:host:local05.gruter.com, port=44238,44237,47304, slots=2048:4:8, 
> used=0:0:0 added in live TajoWorker list
> 2013-10-16 14:55:43,149 INFO  BlockStateChange 
> (BlockManager.java:logAddStoredBlock(2174)) - BLOCK* addStoredBlock: blockMap 
> updated: 127.0.0.1:60398 is added to 
> blk_-2382303275850279660_1004{blockUCState=COMMITTED, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[127.0.0.1:60398|RBW]]} size 117894
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster 
> (TajoTestingCluster.java:shutdownMiniCluster(447)) - 
> ========================================
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster 
> (TajoTestingCluster.java:shutdownMiniCluster(448)) - Shutdown minicluster
> 2013-10-16 14:55:43,600 INFO  tajo.TajoTestingCluster 
> (TajoTestingCluster.java:shutdownMiniCluster(449)) - 
> ========================================
> 2013-10-16 14:55:43,612 INFO  rpc.NettyServerBase 
> (NettyServerBase.java:shutdown(120)) - Rpc (TajoMasterProtocol) listened on 
> 127.0.0.1:44236) shutdown
> 2013-10-16 14:55:43,613 INFO  rpc.NettyServerBase 
> (NettyServerBase.java:shutdown(120)) - Rpc (TajoMasterClientProtocol) 
> listened on 127.0.0.1:44235) shutdown
> 2013-10-16 14:55:43,614 INFO  rpc.NettyServerBase 
> (NettyServerBase.java:shutdown(120)) - Rpc (CatalogProtocol) listened on 
> 127.0.0.1:44234) shutdown
> 2013-10-16 14:55:43,614 INFO  catalog.CatalogServer 
> (CatalogServer.java:stop(161)) - Catalog Server (127.0.0.1:44234) shutdown
> 2013-10-16 14:55:43,614 INFO  master.TajoMaster (TajoMaster.java:stop(436)) - 
> Tajo Master main thread exiting
> 2013-10-16 14:55:43,615 INFO  worker.TajoWorker (TajoWorker.java:run(451)) - 
> Worker Resource Heartbeat Thread stopped.
> 2013-10-16 14:55:43,616 INFO  rpc.NettyClientBase 
> (NettyClientBase.java:close(87)) - Proxy is disconnected from 127.0.0.1:44234
> 2013-10-16 14:55:43,618 INFO  rpc.NettyClientBase 
> (NettyClientBase.java:close(87)) - Proxy is disconnected from 127.0.0.1:44236
> 2013-10-16 14:55:43,619 INFO  rpc.NettyServerBase 
> (NettyServerBase.java:shutdown(120)) - Rpc (TajoWorkerProtocol) listened on 
> 0:0:0:0:0:0:0:0:44238) shutdown
> 2013-10-16 14:55:43,619 INFO  worker.TajoWorkerManagerService 
> (TajoWorkerManagerService.java:stop(109)) - TajoWorkerManagerService stopped
> 2013-10-16 14:55:43,619 INFO  querymaster.QueryMaster 
> (QueryMaster.java:run(346)) - QueryMaster heartbeat thread stopped
> 2013-10-16 14:55:43,620 INFO  master.TajoAsyncDispatcher 
> (TajoAsyncDispatcher.java:stop(122)) - AsyncDispatcher 
> stopped:querymaster_1381902943098
> 2013-10-16 14:55:43,620 INFO  querymaster.QueryMaster 
> (QueryMaster.java:stop(151)) - QueryMaster stop
> 2013-10-16 14:55:43,620 INFO  worker.TajoWorkerClientService 
> (TajoWorkerClientService.java:stop(103)) - TajoWorkerClientService stopping
> 2013-10-16 14:55:43,621 INFO  rpc.NettyServerBase 
> (NettyServerBase.java:shutdown(120)) - Rpc (QueryMasterClientProtocol) 
> listened on 0:0:0:0:0:0:0:0:44237) shutdown
> 2013-10-16 14:55:43,621 INFO  worker.TajoWorkerClientService 
> (TajoWorkerClientService.java:stop(107)) - TajoWorkerClientService stopped
> 2013-10-16 14:55:43,628 INFO  worker.TajoWorker (TajoWorker.java:stop(211)) - 
> TajoWorker main thread exiting
> 2013-10-16 14:55:43,628 WARN  datanode.DirectoryScanner 
> (DirectoryScanner.java:shutdown(289)) - DirectoryScanner: shutdown has been 
> called
> 2013-10-16 14:55:43,745 WARN  datanode.DataNode 
> (BPServiceActor.java:offerService(575)) - BPOfferService for Block pool 
> BP-1571979412-192.168.0.205-1381902941942 (storage id 
> DS-977793291-192.168.0.205-60398-1381902942381) service to 
> localhost/127.0.0.1:56994 interrupted
> 2013-10-16 14:55:43,745 WARN  datanode.DataNode 
> (BPServiceActor.java:run(685)) - Ending block pool service for: Block pool 
> BP-1571979412-192.168.0.205-1381902941942 (storage id 
> DS-977793291-192.168.0.205-60398-1381902942381) service to 
> localhost/127.0.0.1:56994
> 2013-10-16 14:55:43,747 WARN  datanode.DirectoryScanner 
> (DirectoryScanner.java:shutdown(289)) - DirectoryScanner: shutdown has been 
> called
> 2013-10-16 14:55:45,043 INFO  BlockStateChange 
> (BlockManager.java:computeReplicationWorkForBlocks(1308)) - BLOCK* ask 
> 127.0.0.1:41909 to replicate blk_8623857380696071962_1002 to datanode(s) 
> 127.0.0.1:60398 127.0.0.1:51428
> 2013-10-16 14:55:45,279 WARN  datanode.DataNode (DataNode.java:run(1485)) - 
> DatanodeRegistration(127.0.0.1, 
> storageID=DS-1668021296-192.168.0.205-41909-1381902942201, infoPort=60807, 
> ipcPort=52563, 
> storageInfo=lv=-40;cid=testClusterID;nsid=1161821716;c=0):Failed to transfer 
> BP-1571979412-192.168.0.205-1381902941942:blk_8623857380696071962_1002 to 
> 127.0.0.1:60398 got 
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:526)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:490)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1423)
>       at java.lang.Thread.run(Thread.java:724)
> 2013-10-16 14:55:45,361 WARN  datanode.DataNode 
> (BPServiceActor.java:run(685)) - Ending block pool service for: Block pool 
> BP-1571979412-192.168.0.205-1381902941942 (storage id 
> DS-617348020-192.168.0.205-51428-1381902942289) service to 
> localhost/127.0.0.1:56994
> 2013-10-16 14:55:51,045 INFO  BlockStateChange 
> (BlockManager.java:computeReplicationWorkForBlocks(1308)) - BLOCK* ask 
> 127.0.0.1:41909 to replicate blk_8623857380696071962_1002 to datanode(s) 
> 127.0.0.1:51428 127.0.0.1:60398
> 2013-10-16 14:55:51,282 WARN  datanode.DataNode 
> (BlockReceiver.java:<init>(233)) - IOException in BlockReceiver constructor. 
> Cause is 
> 2013-10-16 14:55:51,282 ERROR datanode.DataNode (DataXceiver.java:run(223)) - 
> 127.0.0.1:51428:DataXceiver error processing WRITE_BLOCK operation  src: 
> /127.0.0.1:46346 dest: /127.0.0.1:51428
> java.io.IOException: block pool BP-1571979412-192.168.0.205-1381902941942 is 
> not found
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getBlockPoolSlice(FsVolumeImpl.java:122)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:150)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:818)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:91)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:155)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>       at java.lang.Thread.run(Thread.java:724)
> ^C2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(459)) 
> - ============================================
> 2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(460)) - 
> TajoWorker received SIGINT Signal
> 2013-10-16 14:55:56,252 INFO  worker.TajoWorker (TajoWorker.java:run(461)) - 
> ============================================
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to