Comments inline ... On Mon, Feb 2, 2009 at 4:23 PM, Konstantin Shvachko <s...@yahoo-inc.com>wrote:
> > What do you recommend? > > In general. There may be people/organizations, which will not compromise > on the reduced functionality in favor of the stability, this is > understandable. > I would propose to create a separate (unofficial experimental) branch, > which > would track changes like HADOOP-4379. The branch may later either die when > the > main stream is fixed or be merged with the trunk if the changes proved to > be stable. Sure, that sounds reasonable. One thing I would caution against is spending a lot of time doing incremental patchwork on something that needs a ground-up overhaul. I would much rather wait a couple of months longer and get software that is based on a well thought out design that is fundamentally sound. Ultimately that will be the fastest path to stability. > > >1. the file length (as returned by getFileStatus) is incorrect > May be the following work around will be useful. > If you read from a file you always try to read more data than the length > reported > by the name-node. How much more? The size of one block would be enough, or > even to the next (ceiling) block boundary. I could certainly implement a workaround, however, from an API standpoint, the filesystem (IMHO) should always give you a way to obtain the real length of the file. The semantics of the current getFileStatus() make it difficult to reason about the state of your filesystem. It basically returns a "possibly stale" version of the length. I would prefer to wait for an implementation that gives an accurate answer and spend my time and energy helping to test that one, rather than spending a bunch of time implementing a workaround for the current version. >2. When an application comes up after a crash, it seems to hang for about > 60 > > Don't have enough context on that, sorry. I spoke too soon on this. The reason that HDFS was hanging on lease recovery was because I was opening the file in append mode to force lease recovery (at Dhruba's suggestion) so that it would update the NameNode with the proper length. If I had a method of obtaining the accurate length of the file, I wouldn't need to do this. Hence, I didn't bother filing an issue on this. - Doug > Thanks, > --Konstantin > > Doug Judd wrote: > >> Sounds good. I would much rather wait and have fsync() done correctly in >> 0.20 than get some sort of hacked version in 0.19. I'll create a couple >> of >> issues and mark them for 0.20 Thanks. >> >> - Doug >> >> On Mon, Feb 2, 2009 at 1:51 PM, Owen O'Malley <omal...@apache.org> wrote: >> >> On Feb 2, 2009, at 12:51 PM, Doug Judd wrote: >>> >>> What do you recommend? Is there anyway we could get these two issues >>> >>>> fixed >>>> for 0.19.1, or should I file issues for them and get them on the >>>> schedule >>>> for 0.19.2? >>>> >>>> Given the outstanding problems and general level of uncertainty, I'd >>> favor >>> releasing a 0.19.1 with the equivalent of the 0.18.3 disable on fsync and >>> append. Let's get them fixed in 0.20 first and then we can debate whether >>> the rewards of pushing them back into an 0.19.2 would make sense. I'm >>> pretty >>> uncomfortable at the moment with how the entire functional complex seems >>> to >>> cause a continuous stream of problems. >>> >>> -- Owen >>> >>> >>