sure you are welcome. Let me fix the issues you have pointed out. I'll update you soon by this weekend.
_Ashutosh ________________________________ From: slcclimber [via Apache Spark Developers List] <ml-node+s1001551n9287...@n3.nabble.com> Sent: Tuesday, November 11, 2014 11:46 PM To: Ashutosh Trivedi (MT2013030) Subject: Re: [MLlib] Contributing Algorithm for Outlier Detection Mayur, Libsvm format sounds good to me. I could work on writing the tests if that helps you? Anant On Nov 11, 2014 11:06 AM, "Ashutosh [via Apache Spark Developers List]" <[hidden email]</user/SendEmail.jtp?type=node&node=9287&i=0>> wrote: Hi Mayur, Vector data types are implemented using breeze library, it is presented at .../org/apache/spark/mllib/linalg Anant, One restriction I found that a vector can only be of 'Double', so it actually restrict the user. What are you thoughts on LibSVM format? Thanks for the comments, I was just trying to get away from those increment /decrement functions, they look ugly. Points are noted. I'll try to fix them soon. Tests are also required for the code. Regards, Ashutosh ________________________________ From: Mayur Rustagi [via Apache Spark Developers List] <ml-node+[hidden email]<http://user/SendEmail.jtp?type=node&node=9286&i=0>> Sent: Saturday, November 8, 2014 12:52 PM To: Ashutosh Trivedi (MT2013030) Subject: Re: [MLlib] Contributing Algorithm for Outlier Detection > > We should take a vector instead giving the user flexibility to decide > data source/ type What do you mean by vector datatype exactly? Mayur Rustagi Ph: <a href="tel:%2B1%20%28760%29%20203%203257" value="+17602033257" target="_blank">+1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Wed, Nov 5, 2014 at 6:45 AM, slcclimber <[hidden email]<http://user/SendEmail.jtp?type=node&node=9239&i=0>> wrote: > Ashutosh, > I still see a few issues. > 1. On line 112 you are counting using a counter. Since this will happen in > a RDD the counter will cause issues. Also that is not good functional style > to use a filter function with a side effect. > You could use randomSplit instead. This does not the same thing without the > side effect. > 2. Similar shared usage of j in line 102 is going to be an issue as well. > also hash seed does not need to be sequential it could be randomly > generated or hashed on the values. > 3. The compute function and trim scores still runs on a comma separeated > RDD. We should take a vector instead giving the user flexibility to decide > data source/ type. what if we want data from hive tables or parquet or JSON > or avro formats. This is a very restrictive format. With vectors the user > has the choice of taking in whatever data format and converting them to > vectors insteda of reading json files creating a csv file and then workig > on that. > 4. Similar use of counters in 54 and 65 is an issue. > Basically the shared state counters is a huge issue that does not scale. > Since the processing of RDD's is distributed and the value j lives on the > master. > > Anant > > > > On Tue, Nov 4, 2014 at 7:22 AM, Ashutosh [via Apache Spark Developers List] > <[hidden email]<http://user/SendEmail.jtp?type=node&node=9239&i=1>> wrote: > > > Anant, > > > > I got rid of those increment/ decrements functions and now code is much > > cleaner. Please check. All your comments have been looked after. > > > > > > > https://github.com/codeAshu/Outlier-Detection-with-AVF-Spark/blob/master/OutlierWithAVFModel.scala > > > > > > _Ashu > > > > < > https://github.com/codeAshu/Outlier-Detection-with-AVF-Spark/blob/master/OutlierWithAVFModel.scala > > > > Outlier-Detection-with-AVF-Spark/OutlierWithAVFModel.scala at master · > > codeAshu/Outlier-Detection-with-AVF-Spark · GitHub > > Contribute to Outlier-Detection-with-AVF-Spark development by creating > an > > account on GitHub. > > Read more... > > < > https://github.com/codeAshu/Outlier-Detection-with-AVF-Spark/blob/master/OutlierWithAVFModel.scala > > > > > > ------------------------------ > > *From:* slcclimber [via Apache Spark Developers List] <ml-node+[hidden > > email] <http://user/SendEmail.jtp?type=node&node=9083&i=0>> > > *Sent:* Friday, October 31, 2014 10:09 AM > > *To:* Ashutosh Trivedi (MT2013030) > > *Subject:* Re: [MLlib] Contributing Algorithm for Outlier Detection > > > > > > You should create a jira ticket to go with it as well. > > Thanks > > On Oct 30, 2014 10:38 PM, "Ashutosh [via Apache Spark Developers List]" > <[hidden > > email] <http://user/SendEmail.jtp?type=node&node=9037&i=0>> wrote: > > > >> Okay. I'll try it and post it soon with test case. After that I think > >> we can go ahead with the PR. > >> ------------------------------ > >> *From:* slcclimber [via Apache Spark Developers List] <ml-node+[hidden > >> email] <http://user/SendEmail.jtp?type=node&node=9036&i=0>> > >> *Sent:* Friday, October 31, 2014 10:03 AM > >> *To:* Ashutosh Trivedi (MT2013030) > >> *Subject:* Re: [MLlib] Contributing Algorithm for Outlier Detection > >> > >> > >> Ashutosh, > >> A vector would be a good idea vectors are used very frequently. > >> Test data is usually stored in the spark/data/mllib folder > >> On Oct 30, 2014 10:31 PM, "Ashutosh [via Apache Spark Developers List]" > >> <[hidden email] <http://user/SendEmail.jtp?type=node&node=9035&i=0>> > >> wrote: > >> > >>> Hi Anant, > >>> sorry for my late reply. Thank you for taking time and reviewing it. > >>> > >>> I have few comments on first issue. > >>> > >>> You are correct on the string (csv) part. But we can not take input of > >>> type you mentioned. We calculate frequency in our function. Otherwise > user > >>> has to do all this computation. I realize that taking a RDD[Vector] > would > >>> be general enough for all. What do you say? > >>> > >>> I agree on rest all the issues. I will correct them soon and post it. > >>> I have a doubt on test cases. Where should I put data while giving test > >>> scripts? or should i generate synthetic data for testing with in the > >>> scripts, how does this work? > >>> > >>> Regards, > >>> Ashutosh > >>> > >>> ------------------------------ > >>> If you reply to this email, your message will be added to the > >>> discussion below: > >>> > >>> > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9034.html > >>> To unsubscribe from [MLlib] Contributing Algorithm for Outlier > >>> Detection, click here. > >>> NAML > >>> < > http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >>> > >> > >> > >> ------------------------------ > >> If you reply to this email, your message will be added to the > >> discussion below: > >> > >> > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9035.html > >> To unsubscribe from [MLlib] Contributing Algorithm for Outlier > >> Detection, click here. > >> NAML > >> < > http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >> > >> > >> ------------------------------ > >> If you reply to this email, your message will be added to the > >> discussion below: > >> > >> > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9036.html > >> To unsubscribe from [MLlib] Contributing Algorithm for Outlier > >> Detection, click here. > >> NAML > >> < > http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >> > > > > > > ------------------------------ > > If you reply to this email, your message will be added to the discussion > > below: > > > > > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9037.html > > To unsubscribe from [MLlib] Contributing Algorithm for Outlier > Detection, click > > here. > > NAML > > < > http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > > > > > > ------------------------------ > > If you reply to this email, your message will be added to the discussion > > below: > > > > > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9083.html > > To unsubscribe from [MLlib] Contributing Algorithm for Outlier > Detection, click > > here > > < > > > > . > > NAML > > < > http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > > > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9095.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > ________________________________ If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9239.html To unsubscribe from [MLlib] Contributing Algorithm for Outlier Detection, click here. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ________________________________ If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9286.html To unsubscribe from [MLlib] Contributing Algorithm for Outlier Detection, click here. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ________________________________ If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9287.html To unsubscribe from [MLlib] Contributing Algorithm for Outlier Detection, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=8880&code=YXNodXRvc2gudHJpdmVkaUBpaWl0Yi5vcmd8ODg4MHwtMzkzMzE5NzYx>. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-tp8880p9289.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.