Ok, thank you, Sean.
Ok i think we are in agreement. On Wed, Mar 2, 2011 at 9:22 AM, Sean Owen <[email protected]> wrote: > On Wed, Mar 2, 2011 at 5:05 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > I am not saying that. I actually was just affirming that in case of > > hdfs (albeit not in general per close() contract -- it's not true for > > files for example). > > I hope there's no such case, or else things like file writes may > silently never happen in Java. That shouldn't be true in general, I > hope. > > > > ok what i am trying to argue is that using quietClose(...) in > > o.a.m.common.ioutils is a bad practice for hdfs writes such as side > > files and multiple outputs. I can't seem to get you agree on this. > > Please go ahead as you have proposed already. My only suggestion is > try to make the naming consistent ("close()"?) and ideally we have a > similar API and implementation. You can alter the existing > implementation. > > I actually don't even mind if all the quiet closes are made noisy. > Wherever they exist, it should be the case that close()s don't "do" > anything and shouldn't fail anyway. And if it does, well, sure the > caller can't do anything meaningful, but maybe better to fall over if > something very weird happened, rather than just log it. > > The worst thing that happens with a "noisy" close in your situation is > an M/R is failed and is restarted when it may or may not have been > needed -- and that's no big deal at all. You're right that is actually > a case of the caller being able to recover so it makes sense. > > And come to think of it, I bet there are certain things > SequenceFile.Writer must do to finish writing a file given that HDFS > files are write-once, and it has no finish() or flush() method like, > for instance GzipOutputStream. So yes there is a case where in > practice you do have to make sure close() succeeds rather than > explicitly finish the write. That's a good example indeed. > > > > > > But perhaps i can hope to get you to agree that checking for close > > errors at least doesn't hurt so i can continue checking for them and > > not using quietClose() during MR task commits? > > > > > > > > > > -d > > > > On Wed, Mar 2, 2011 at 8:20 AM, Sean Owen (JIRA) <[email protected]> > wrote: > >> > >> [ > https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001470#comment-13001470] > >> > >> Sean Owen commented on MAHOUT-593: > >> ---------------------------------- > >> > >> I think we're just talking about semantics now. You are welcome to > commit as you like. > >> > >> I do agree that a stream could buffer writes, and that it could commit > as a way to clean up before close, and the commit could fail during close. > If the caller is trying to commit by closing, that's really the problem. > But, it's a real problem.Yes, I don't think that in any Java API I can think > of or HDFS you would have a successful write and successful close but fail > to write. > >> > >>> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure > compatibility with current Mahout dependencies. > >>> > ---------------------------------------------------------------------------------------------------------------------- > >>> > >>> Key: MAHOUT-593 > >>> URL: https://issues.apache.org/jira/browse/MAHOUT-593 > >>> Project: Mahout > >>> Issue Type: New Feature > >>> Components: Math > >>> Affects Versions: 0.4 > >>> Reporter: Dmitriy Lyubimov > >>> Fix For: 0.5 > >>> > >>> Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, > MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf, ssvdclassdiag.png > >>> > >>> > >>> Current Mahout-376 patch requries 'new' hadoop API. Certain elements > of that API (namely, multiple outputs) are not available in standard hadoop > 0.20.2 release. As such, that may work only with either CDH or 0.21 > distributions. > >>> In order to bring it into sync with current Mahout dependencies, a > backport of the patch to 'old' API is needed. > >>> Also, some work is needed to resolve math dependencies. Existing patch > relies on apache commons-math 2.1 for eigen decomposition of small matrices. > This dependency is not currently set up in the mahout core. So, certain > snippets of code are either required to go to mahout-math or use Colt eigen > decompositon (last time i tried, my results were mixed with that one. It > seems to produce results inconsistent with those from mahout-math > eigensolver, at the very least, it doesn't produce singular values in sorted > order). > >>> So this patch is mainly moing some Mahout-376 code around. > >> > >> -- > >> This message is automatically generated by JIRA. > >> - > >> For more information on JIRA, see: > http://www.atlassian.com/software/jira > >> > >> > >> > > >
