[jira] [Commented] (MAHOUT-1604) Create a RowSimilarity for Spark

Dmitriy Lyubimov (JIRA) Sun, 21 Dec 2014 21:21:04 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255443#comment-14255443
 ]


Dmitriy Lyubimov commented on MAHOUT-1604:
------------------------------------------

I cant develop a solution for a problem i dont understand. I have no idea what 
you are talking about and never encountered any of the problems you are 
referring to.

But suppose i had an idea, i already said i am not big on cli s and thats not 
how i use this product at all and honestly dont know why one would want to use 
a cli when both embedding and scala scripting are available options. Similarly 
to the logic of R and scikit that cant care less of creating a cli for every 
new capability the get. this all runs 180 contrary to a computing environment 
product philosophy. Which was the whole idea. I would pay attention to shell 
effort and scripting it instead.

And i am pretty sure that a verbatim job jar format is not a solution to 
whatever. Even hadoop itself doesnt support it. Maybe a shaded jar. But even 
that is fundamentally flawed since there are plenty if jars, such as 
bouncycastle, that do not survive shaded plugin and go defunct.

Projects like tomcat collect dependencies in lib or something, and manage 
further classpaths issues in startup scripts.

> Create a RowSimilarity for Spark
> --------------------------------
>
>                 Key: MAHOUT-1604
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1604
>             Project: Mahout
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.0
>         Environment: Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>
> Using CooccurrenceAnalysis.cooccurrence create a driver that reads a text DRM 
> or two and produces LLR similarity/cross-similarity matrices.
> This will produce the same results as ItemSimilarity but take a Drm as input 
> instead of individual cells.
> The first version will only support LLR, other similarity measures will need 
> to be in separate Jiras



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1604) Create a RowSimilarity for Spark

Reply via email to