[ 
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000453#comment-13000453
 ] 

Sean Owen commented on MAHOUT-593:
----------------------------------

OK, I don't use git though IntelliJ figures out these patches reasonably well 
automatically, so no problem. I thought it had made an error since I saw a src/ 
directory, but that's my error, since that path is still fine even after 
MAHOUT-608. Nevermind there, I have the patch now.

IOUtils has a method to quietly close and log Closeables already. It doesn't 
need two, but, feel free to merge them. It's picking nits but my theory is that 
there's not much to do to "recover" from failing to close a stream. You log it, 
and you move on. So I don't know if it needs to throw an exception in the end 
-- the caller doesn't even know which stream didn't close?

DeleteOnClose is a cute way to stick in file deletion into a close method; if 
it really works nicely for you OK. There's File.deleteOnExit() if it's a 
question of temp file cleanup.

Yes good move on standardizing commons-math dependency. That makes sense.

If the existing eigen solver is perhaps wrong, and still deprecated, just 
delete it. And if that means you don't need the wrapper layer I think you're 
fine to clear that too.

My only structural complaint is that this is not really using AbstractJob. I 
see the idea is to make several phases of the job runnable independently. Is 
that a realistic use case? Usually you run all the phases to get meaningful 
work done (perhaps with the option of restarting from phase n, to recover from 
failure -- but that's handled already in AbstractJob).

It's a lot of duplication of Hadoop glue code at a time when we already have a 
problem with inconsistent and duplicative interaction with Hadoop. (Which is 
not this patch's "fault".) And I know we say, well, we can fix that later, but 
I don't think it will ever come.

We can just give up on standardization.

... or I wonder if a simple halfway solution is expose the 
AbstractJob.prepareJob() as a static method so at least it can be reused by 
folks that still would prefer a different structure. At least the Hadoop Job 
configuration is centralized.

I think there are still a number of tiny style issues, but, not worth holding 
things up over. It can be adjusted later.

> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure 
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-593
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, 
> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf
>
>
> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that 
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2 
> release. As such, that may work only with either CDH or 0.21 distributions. 
>  In order to bring it into sync with current Mahout dependencies, a backport 
> of the patch to 'old' API is needed. 
> Also, some work is needed to resolve math dependencies. Existing patch relies 
> on apache commons-math 2.1 for eigen decomposition of small matrices. This 
> dependency is not currently set up in the mahout core. So, certain snippets 
> of code are either required to go to mahout-math or use Colt eigen 
> decompositon (last time i tried, my results were mixed with that one. It 
> seems to produce results inconsistent with those from mahout-math 
> eigensolver, at the very least, it doesn't produce singular values in sorted 
> order).
> So this patch is mainly moing some Mahout-376 code around.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to