[
https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387984#comment-14387984
]
Saikat Kanjilal edited comment on MAHOUT-1539 at 3/31/15 4:47 AM:
------------------------------------------------------------------
So I did some more research and have some questions:
1) Are we going to deal with images or text data to start?
2) What do we really mean by data point, in my mind its represented by a (x,y)
3) I think the similarity measure associated with determining locality
sensitive hashing should be configurable, namely we should be able to plug in
Jacard/Euclidean or Cosine similarities as functions to be computed
I have a sample localitysensitivehashing scheme coded up in scala but want to
get further clarifications on the above before I proceed further
Thanks for your help
was (Author: kanjilal):
So I did some more research and have some questions, I have added questions to
JIRA as well:
1) Are we going to deal with images or text data to start?
2) What do we really mean by data point, in my mind its represented by a (x,y)
3) I think the similarity measure associated with determining locality
sensitive hashing should be configurable, namely we should be able to plug in
Jacard/Euclidean or Cosine similarities as functions to be computed
I have a sample localitysensitivehashing scheme coded up in scala but want to
get further clarifications on the above before I proceed further
Thanks for your help
> Implement affinity matrix computation in Mahout DSL
> ---------------------------------------------------
>
> Key: MAHOUT-1539
> URL: https://issues.apache.org/jira/browse/MAHOUT-1539
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.9
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Labels: DSL, scala, spark
> Fix For: 0.10.1
>
> Attachments: ComputeAffinities.scala
>
>
> This has the same goal as MAHOUT-1506, but rather than code the pairwise
> computations in MapReduce, this will be done in the Mahout DSL.
> An orthogonal issue is the format of the raw input (vectors, text, images,
> SequenceFiles), and how the user specifies the distance equation and any
> associated parameters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)