Re: Spark-ML : Streaming library for Factorization Machine (FM/FFM)
Hi, Unfortunately no. i just used this lib for FM and FFM raw. I thought it could be a good baseline for your need. Regards Maximilien On 16/04/18 15:43, Sundeep Kumar Mehta wrote: Hi Maximilien, Thanks for your response, Did you convert this repo into DStream for continuous/incremental training ? Regards Sundeep - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark-ML : Streaming library for Factorization Machine (FM/FFM)
Hi Maximilien, Thanks for your response, Did you convert this repo into DStream for continuous/incremental training ? Regards Sundeep On Mon, Apr 16, 2018 at 4:17 PM, Maximilien DEFOURNE < maximilien.defou...@s4m.io> wrote: > Hi, > > I used this repo for FM/FFM : https://github.com/Intel- > bigdata/imllib-spark > > > Regards > > Maximilien DEFOURNE > > On 15/04/18 05:14, Sundeep Kumar Mehta wrote: > > Hi All, > > Any library/ github project to use factorization machine or field aware > factorization machine via online learning for continuous training ? > > Request you to please share your thoughts on this. > > Regards > Sundeep > > > >
Re: Spark-ML : Streaming library for Factorization Machine (FM/FFM)
Hi, I used this repo for FM/FFM : https://github.com/Intel-bigdata/imllib-spark Regards Maximilien DEFOURNE On 15/04/18 05:14, Sundeep Kumar Mehta wrote: Hi All, Any library/ github project to use factorization machine or field aware factorization machine via online learning for continuous training ? Request you to please share your thoughts on this. Regards Sundeep
Re: Spark 2.1 ml library scalability
It's true that CrossValidator is not parallel currently - see https://issues.apache.org/jira/browse/SPARK-19357 and feel free to help review. On Fri, 7 Apr 2017 at 14:18 Aseem Bansalwrote: > >- Limited the data to 100,000 records. >- 6 categorical feature which go through imputation, string indexing, >one hot encoding. The maximum classes for the feature is 100. As data is >imputated it becomes dense. >- 1 numerical feature. >- Training Logistic Regression through CrossValidation with grid to >optimize its regularization parameter over the values 0.0001, 0.001, 0.005, >0.01, 0.05, 0.1 >- Using spark's launcher api to launch it on a yarn cluster in Amazon >AWS. > > I was thinking that as CrossValidator is finding the best parameters it > should be able to run them independently. That sounds like something which > could be ran in parallel. > > > On Fri, Apr 7, 2017 at 5:20 PM, Nick Pentreath > wrote: > > What is the size of training data (number examples, number features)? > Dense or sparse features? How many classes? > > What commands are you using to submit your job via spark-submit? > > On Fri, 7 Apr 2017 at 13:12 Aseem Bansal wrote: > > When using spark ml's LogisticRegression, RandomForest, CrossValidator > etc. do we need to give any consideration while coding in making it scale > with more CPUs or does it scale automatically? > > I am reading some data from S3, using a pipeline to train a model. I am > running the job on a spark cluster with 36 cores and 60GB RAM and I cannot > see much usage. It is running but I was expecting spark to use all RAM > available and make it faster. So that's why I was thinking whether we need > to take something particular in consideration or wrong expectations? > > >
Re: Spark 2.1 ml library scalability
- Limited the data to 100,000 records. - 6 categorical feature which go through imputation, string indexing, one hot encoding. The maximum classes for the feature is 100. As data is imputated it becomes dense. - 1 numerical feature. - Training Logistic Regression through CrossValidation with grid to optimize its regularization parameter over the values 0.0001, 0.001, 0.005, 0.01, 0.05, 0.1 - Using spark's launcher api to launch it on a yarn cluster in Amazon AWS. I was thinking that as CrossValidator is finding the best parameters it should be able to run them independently. That sounds like something which could be ran in parallel. On Fri, Apr 7, 2017 at 5:20 PM, Nick Pentreathwrote: > What is the size of training data (number examples, number features)? > Dense or sparse features? How many classes? > > What commands are you using to submit your job via spark-submit? > > On Fri, 7 Apr 2017 at 13:12 Aseem Bansal wrote: > >> When using spark ml's LogisticRegression, RandomForest, CrossValidator >> etc. do we need to give any consideration while coding in making it scale >> with more CPUs or does it scale automatically? >> >> I am reading some data from S3, using a pipeline to train a model. I am >> running the job on a spark cluster with 36 cores and 60GB RAM and I cannot >> see much usage. It is running but I was expecting spark to use all RAM >> available and make it faster. So that's why I was thinking whether we need >> to take something particular in consideration or wrong expectations? >> >
Re: Spark 2.1 ml library scalability
What is the size of training data (number examples, number features)? Dense or sparse features? How many classes? What commands are you using to submit your job via spark-submit? On Fri, 7 Apr 2017 at 13:12 Aseem Bansalwrote: > When using spark ml's LogisticRegression, RandomForest, CrossValidator > etc. do we need to give any consideration while coding in making it scale > with more CPUs or does it scale automatically? > > I am reading some data from S3, using a pipeline to train a model. I am > running the job on a spark cluster with 36 cores and 60GB RAM and I cannot > see much usage. It is running but I was expecting spark to use all RAM > available and make it faster. So that's why I was thinking whether we need > to take something particular in consideration or wrong expectations? >
Re: Spark as a Library
If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a cluster outside. Note that I haven't tried this though, so the security policies of the container might be too restrictive. In that case you'd have to run the app outside and expose an RPC interface between them. Matei On September 16, 2014 at 8:17:08 AM, Ruebenacker, Oliver A (oliver.ruebenac...@altisource.com) wrote: Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Re: Spark as a Library
It depends on what you want to do with Spark. The following has worked for me. Let the container handle the HTTP request and then talk to Spark using another HTTP/REST interface. You can use the Spark Job Server for this. Embedding Spark inside the container is not a great long term solution IMO because you may see issues when you want to connect with a Spark cluster. On Tue, Sep 16, 2014 at 11:16 AM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.com wrote: Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Spark as a Library
Hello, Thanks for the response and great to hear it is possible. But how do I connect to Spark without using the submit script? I know how to start up a master and some workers and then connect to the master by packaging the app that contains the SparkContext and then submitting the package with the spark-submit script in standalone-mode. But I don’t want to submit the app that contains the SparkContext via the script, because I want that app to be running on a web server. So, what are other ways to connect to Spark? I can’t find in the docs anything other than using the script. Thanks! Best, Oliver From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Tuesday, September 16, 2014 1:31 PM To: Ruebenacker, Oliver A; user@spark.apache.org Subject: Re: Spark as a Library If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a cluster outside. Note that I haven't tried this though, so the security policies of the container might be too restrictive. In that case you'd have to run the app outside and expose an RPC interface between them. Matei On September 16, 2014 at 8:17:08 AM, Ruebenacker, Oliver A (oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com) wrote: Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Re: Spark as a Library
You can create a new SparkContext inside your container pointed to your master. However, for your script to run you must call addJars to put the code on your workers' classpaths (except when running locally). Hopefully your webapp has some lib folder which you can point to as a source for the jars. In the Play Framework you can use play.api.Play.application.getFile(lib) to get a path to the lib directory and get the contents. Of course that only works on the packaged web app. On Tue, Sep 16, 2014 at 3:17 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.com wrote: Hello, Thanks for the response and great to hear it is possible. But how do I connect to Spark without using the submit script? I know how to start up a master and some workers and then connect to the master by packaging the app that contains the SparkContext and then submitting the package with the spark-submit script in standalone-mode. But I don’t want to submit the app that contains the SparkContext via the script, because I want that app to be running on a web server. So, what are other ways to connect to Spark? I can’t find in the docs anything other than using the script. Thanks! Best, Oliver *From:* Matei Zaharia [mailto:matei.zaha...@gmail.com] *Sent:* Tuesday, September 16, 2014 1:31 PM *To:* Ruebenacker, Oliver A; user@spark.apache.org *Subject:* Re: Spark as a Library If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a cluster outside. Note that I haven't tried this though, so the security policies of the container might be too restrictive. In that case you'd have to run the app outside and expose an RPC interface between them. Matei On September 16, 2014 at 8:17:08 AM, Ruebenacker, Oliver A ( oliver.ruebenac...@altisource.com) wrote: Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io