[
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001464#comment-13001464
]
Dmitriy Lyubimov commented on MAHOUT-593:
-----------------------------------------
Sean, sorry, can't agree with the above. Without going into specific
implementation details, in general close contract is twofold: it is a
handle release, but it is also an I/O operation. In particular,
failure to close output streams means that some of your previous write
operations and/or housekeeping I/O failed. As such, failure to close
output streams is equivalent to write operation error from client's
perspective.
Truth to be spoken, close is not same as commit in sense of
durability: in general case, you can't say that writes are durable if
close was successful -- but you know they weren't if you couldn't
close cleanly. But it so happens in case of hdfs they say closes (and
syncs in hadoop-append branch) are durable by replication.
Hence if you can't close side files or MultipleOutputs cleanly, you
must not allow task to commit -- you must allow to schedule another
attempt or accept results of an opportunistic task attempt but not
this one. Ignoring close errors in this case may result in invalid
task output.
-d
On Wed, Mar 2, 2011 at 12:45 AM, Sean Owen (JIRA) <[email protected]> wrote:
> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-593
> URL: https://issues.apache.org/jira/browse/MAHOUT-593
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Fix For: 0.5
>
> Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz,
> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf, ssvdclassdiag.png
>
>
> Current Mahout-376 patch requries 'new' hadoop API. Certain elements of that
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2
> release. As such, that may work only with either CDH or 0.21 distributions.
> In order to bring it into sync with current Mahout dependencies, a backport
> of the patch to 'old' API is needed.
> Also, some work is needed to resolve math dependencies. Existing patch relies
> on apache commons-math 2.1 for eigen decomposition of small matrices. This
> dependency is not currently set up in the mahout core. So, certain snippets
> of code are either required to go to mahout-math or use Colt eigen
> decompositon (last time i tried, my results were mixed with that one. It
> seems to produce results inconsistent with those from mahout-math
> eigensolver, at the very least, it doesn't produce singular values in sorted
> order).
> So this patch is mainly moing some Mahout-376 code around.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira