On Wed, Mar 2, 2011 at 5:05 PM, Dmitriy Lyubimov <[email protected]> wrote:
> I am not saying that. I actually was just affirming that in case of
> hdfs (albeit not in general per close() contract -- it's not true for
> files for example).

I hope there's no such case, or else things like file writes may
silently never happen in Java. That shouldn't be true in general, I
hope.


> ok what i am trying to argue is that using quietClose(...) in
> o.a.m.common.ioutils is a bad practice for hdfs writes such as side
> files and multiple outputs. I can't seem to get you agree on this.

Please go ahead as you have proposed already. My only suggestion is
try to make the naming consistent ("close()"?) and ideally we have a
similar API and implementation. You can alter the existing
implementation.

I actually don't even mind if all the quiet closes are made noisy.
Wherever they exist, it should be the case that close()s don't "do"
anything and shouldn't fail anyway. And if it does, well, sure the
caller can't do anything meaningful, but maybe better to fall over if
something very weird happened, rather than just log it.

The worst thing that happens with a "noisy" close in your situation is
an M/R is failed and is restarted when it may or may not have been
needed -- and that's no big deal at all. You're right that is actually
a case of the caller being able to recover so it makes sense.

And come to think of it, I bet there are certain things
SequenceFile.Writer must do to finish writing a file given that HDFS
files are write-once, and it has no finish() or flush() method like,
for instance GzipOutputStream. So yes there is a case where in
practice you do have to make sure close() succeeds rather than
explicitly finish the write. That's a good example indeed.


>
> But perhaps i can hope to  get you to agree that checking for close
> errors at least doesn't hurt so i can continue checking for them and
> not using quietClose() during MR task commits?
>
>
>
>
> -d
>
> On Wed, Mar 2, 2011 at 8:20 AM, Sean Owen (JIRA) <[email protected]> wrote:
>>
>>    [ 
>> https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001470#comment-13001470
>>  ]
>>
>> Sean Owen commented on MAHOUT-593:
>> ----------------------------------
>>
>> I think we're just talking about semantics now. You are welcome to commit as 
>> you like.
>>
>> I do agree that a stream could buffer writes, and that it could commit as a 
>> way to clean up before close, and the commit could fail during close. If the 
>> caller is trying to commit by closing, that's really the problem. But, it's 
>> a real problem.Yes, I don't think that in any Java API I can think of or 
>> HDFS you would have a successful write and successful close but fail to 
>> write.
>>
>>> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure 
>>> compatibility with current Mahout dependencies.
>>> ----------------------------------------------------------------------------------------------------------------------
>>>
>>>                 Key: MAHOUT-593
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>>>             Project: Mahout
>>>          Issue Type: New Feature
>>>          Components: Math
>>>    Affects Versions: 0.4
>>>            Reporter: Dmitriy Lyubimov
>>>             Fix For: 0.5
>>>
>>>         Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, 
>>> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf, ssvdclassdiag.png
>>>
>>>
>>> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of 
>>> that API (namely, multiple outputs) are not available in standard hadoop 
>>> 0.20.2 release. As such, that may work only with either CDH or 0.21 
>>> distributions.
>>>  In order to bring it into sync with current Mahout dependencies, a 
>>> backport of the patch to 'old' API is needed.
>>> Also, some work is needed to resolve math dependencies. Existing patch 
>>> relies on apache commons-math 2.1 for eigen decomposition of small 
>>> matrices. This dependency is not currently set up in the mahout core. So, 
>>> certain snippets of code are either required to go to mahout-math or use 
>>> Colt eigen decompositon (last time i tried, my results were mixed with that 
>>> one. It seems to produce results inconsistent with those from mahout-math 
>>> eigensolver, at the very least, it doesn't produce singular values in 
>>> sorted order).
>>> So this patch is mainly moing some Mahout-376 code around.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>
>>
>

Reply via email to