[
https://issues.apache.org/jira/browse/HDFS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497800#comment-13497800
]
Uma Maheswara Rao G commented on HDFS-4130:
-------------------------------------------
Thanks for the update.
{code}
try {
BookKeeperEditLogInputStream elis = null;
for (EditLogLedgerMetadata l : currentLedgerList) {
long lastTxId = l.getLastTxId();
if (l.isInProgress()) {
lastTxId = recoverLastTxId(l, false);
}
{code}
{code}
}
+ else {
+ return;
+ }
{code}
Your patch has tab characters please check. Please correct them and use hadoop
formatter.
I think selectInputStreams throws IOException from API semantics now.
I feel it is correct to throw out, when there is an Exception due to storage
ex:ZK, BK exceptions.
@Ivan, What do you say?
> The reading for editlog at NN starting using bkjm is not efficient
> -------------------------------------------------------------------
>
> Key: HDFS-4130
> URL: https://issues.apache.org/jira/browse/HDFS-4130
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, performance
> Affects Versions: 3.0.0, 2.0.2-alpha
> Reporter: Han Xiao
> Attachments: HDFS-4130.patch, HDFS-4130-v2.patch
>
>
> Now, the method of BookKeeperJournalManager.selectInputStreams is written
> like:
> while (true) {
> EditLogInputStream elis;
> try {
> elis = getInputStream(fromTxId, inProgressOk);
> } catch (IOException e) {
> LOG.error(e);
> return;
> }
> if (elis == null) {
> return;
> }
> streams.add(elis);
> if (elis.getLastTxId() == HdfsConstants.INVALID_TXID) {
> return;
> }
> fromTxId = elis.getLastTxId() + 1;
> }
>
> EditLogInputstream is got from getInputStream(), which will read the ledgers
> from zookeeper in each calling.
> This will be a larger cost of times when the the number ledgers becomes large.
> The reading of ledgers from zk is not necessary for every calling of
> getInputStream().
> The log of time wasting here is as follows:
> 2012-10-30 16:44:52,995 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> Caching file names occuring more than 10 times
> 2012-10-30 16:49:24,643 INFO
> hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient:
> Successfully connected to bookie: /167.52.1.121:318
> The stack of the process when blocking between the two lines of log is like:
> "main" prio=10 tid=0x000000004011f000 nid=0x39ba in Object.wait()
> \[0x00007fca020fe000\]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at
> hidden.bkjournal.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253)
> \- locked <0x00000006fb8495a8> (a
> hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
> at
> hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129)
> at
> org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper.getData(RetryableZookeeper.java:501)
> at
> hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
> at
> org.apache.hadoop.contrib.bkjournal.EditLogLedgerMetadata.read(EditLogLedgerMetadata.java:113)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getLedgerList(BookKeeperJournalManager.java:725)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getInputStream(BookKeeperJournalManager.java:442)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.selectInputStreams(BookKeeperJournalManager.java:480)
>
> betweent different time, the diff of stack is:
> diff stack stack2
> 1c1
> < 2012-10-30 16:44:53
> ---
> > 2012-10-30 16:46:17
> 106c106
> < - locked <0x00000006fb8495a8> (a
> hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
> ---
> > - locked <0x00000006fae58468> (a
> > hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet)
> In our environment, the waiting time could even reach to tens of minutes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira