We might need to change log method by adding  

if(LOG.isInfoEnabled()){
  ...
}

at least it can prevent string concatenation for performance optimization. 
(debug can be changed to if(LOG.isDebugEnabled()){} for performance 
optimization, too.)

In addition, can you help check if enterBarrier() contains the following code 
snippet?

   ...
   zk.exists(pathToSuperstepZnode+"/ready", new Watcher() {
      @Override
      public void process(WatchedEvent event) {
          // check if /ready znode exists, then delete it. 
          ... 
          } catch(KeeperException.NoNodeException nne) {
            LOG.warn("Ignore because znode may be deleted.", nne);
          }...
      }
    });
    zk.create(getNodeName(), null, Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
    ...

It looks like bsp peer is trying to remove /ready znode which may have already 
been removed by other bsp peer. Or stack trace in log would be helpful. 


-----Original message-----
From:Thomas Jungblut <[email protected]>
To:[email protected]
Date:Thu, 22 Sep 2011 10:05:52 +0200
Subject:Re: Awesome bench results after removing Thread.sleep in sync() method.

You're going to laugh, but we spend 80% of the time, logging the messages.
Let's change the log level to debug or remove the logging in the bench
example.

Sadly I still receive

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /bsp/job_201109220959_0001/224/ready
>

and it hangs forever. Current version is after you committed ChiaHung's
patch.
I'm in pseudo-distributed mode with 3 tasks.

Are you going to bench this without the logging? That would be interesting
though ;D

2011/9/22 Thomas Jungblut <[email protected]>

> That is great. I think we can push this under 200s.
> I attach a profiler and send you a list of hotspots.
>
> lg.
>
> 2011/9/22 Edward J. Yoon <[email protected]>
>
> By ChiaHung's HAMA-387.patch, hang problem is fixed.
>>
>> And also, on same environment (1 rack, 256 cores), a bench example
>> result is dramatically improved. (184.076 seconds from 307.129
>> seconds)
>>
>> ----
>> # core/bin/hama jar
>> examples/target/hama-examples-0.4.0-incubating-SNAPSHOT.jar bench 16
>> 1000 512
>> ..
>> 11/09/22 10:27:32 INFO bsp.BSPJobClient: Current supersteps number: 504
>> 11/09/22 10:27:35 INFO bsp.BSPJobClient: Current supersteps number: 508
>> 11/09/22 10:27:38 INFO bsp.BSPJobClient: Current supersteps number: 512
>> 11/09/22 10:27:38 INFO bsp.BSPJobClient: The total number of supersteps:
>> 512
>> Job Finished in 184.076 seconds
>>
>> Hama 0.4 (r.1163903) was:
>>
>> 16 bytes | 1000 | 512 | 307.129 seconds
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin
>
> mobile: 0170-3081070
>
> business: [email protected]
> private: [email protected]
>



-- 
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: [email protected]
private: [email protected]


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to