Re: Building a ML pipeline with no training

2022-07-20 Thread Sean Owen
The data transformation is all the same. Sure, linear regression is easy: https://spark.apache.org/docs/latest/ml-classification-regression.html#linear-regression These are components that operate on DataFrames. You'll want to look at VectorAssembler to prepare data into an array column. There

Building a ML pipeline with no training

2022-07-20 Thread Edgar H
Morning everyone, The question may seem to broad but will try to synth as much as possible: I'm used to work with Spark SQL, DFs and such on a daily basis, easily grouping, getting extra counters and using functions or UDFs. However, I've come to an scenario where I need to make some predictions

RE: Re: [Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-12 Thread Alana Young
I have updated the gist (https://gist.github.com/ally1221/5acddd9650de3dc67f6399a4687893aa ). Please let me know if there are any additional questions.

Re: [Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-12 Thread Gourav Sengupta
Hi, may be I have less time, but can you please add some inline comments in your code to explain what you are trying to do? Regards, Gourav Sengupta On Tue, Jan 11, 2022 at 5:29 PM Alana Young wrote: > I am experimenting with creating and persisting ML pipelines using custom > transformers

[Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer

2022-01-11 Thread Alana Young
I am experimenting with creating and persisting ML pipelines using custom transformers (I am using Spark 3.1.2). I was able to create a transformer class (for testing purposes, I modeled the code off the SQLTransformer class) and save the pipeline model. When I attempt to load the saved

Re: Problem in Restoring ML Pipeline with UDF

2021-06-08 Thread Sean Owen
pes.createStructType(schema.fields :+ > DataTypes.createStructField($(outputCol), IntegerType, false)) > } > > override def copy(extra: ParamMap): Transformer = copy(extra) > } > > This was included in an ML pipeline, fitted into a model and persisted to > a disk

Problem in Restoring ML Pipeline with UDF

2021-06-08 Thread Artemis User
e.equals(IntegerType) || actualType.equals(DoubleType), s"Input column must be of numeric type")     DataTypes.createStructType(schema.fields :+ DataTypes.createStructField($(outputCol), IntegerType, false))     }     override def copy(extra: ParamMap): Transformer = copy(extra) } This wa

[Spark MLlib]: Multiple input dataframes and non-linear ML pipeline

2020-04-09 Thread Qingsheng Ren
Hi all, I'm using ML Pipeline to construct a flow of transformation. I'm wondering if it is possible to set multiple dataframes as the input of a transformer? For example I need to join two dataframes together in a transformer, then feed into the estimator for training. If not, is there any plan

[Spark MLlib]: Multiple input dataframes and non-linear ML pipeline

2020-04-09 Thread Qingsheng Ren
Hi all, I'm using ML Pipeline to construct a flow of transformation. I'm wondering if it is possible to set multiple dataframes as the input of a transformer? For example I need to join two dataframes together in a transformer, then feed into the estimator for training. If not, is there any plan

Re: ml Pipeline read write

2019-05-10 Thread Koert Kuipers
i guess it simply is never set, in which case it is created in: protected final def sparkSession: SparkSession = { if (optionSparkSession.isEmpty) { optionSparkSession = Some(SparkSession.builder().getOrCreate()) } optionSparkSession.get } On Fri, May 10, 2019 at 4:31 PM

ml Pipeline read write

2019-05-10 Thread Koert Kuipers
i am trying to understand how ml persists pipelines. it seems a SparkSession or SparkContext is needed for this, to write to hdfs. MLWriter and MLReader both extend BaseReadWrite to have access to a SparkSession. but this is where it gets confusing... the only way to set the SparkSession seems to

Re: [Spark Streaming] [ML]: Exception handling for the transform method of Spark ML pipeline model

2018-08-17 Thread sudododo
Hi, Any help on this? Thanks, -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark Streaming] [ML]: Exception handling for the transform method of Spark ML pipeline model

2018-08-16 Thread sudododo
ML pipeline model. The code sample explains how the feature DStream interacts with the pipeline model. prediction_stream = feature_stream.transform(lambda rdd: predict_rdd(rdd, pipeline_model) def predict_rdd(rdd, pipeline_model): if(rdd is not None) and (not rdd.isEmpty()): try

Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
gt;> serving layer. >> >> There is (very initial) movement towards improving the local serving >> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which >> was the "first step" in this process). >> >> On Fri, 1 Jul 2016 at 19:24 Ja

Re: Deploying ML Pipeline Model

2016-07-05 Thread Nick Pentreath
Sean is correct - we now use jpmml-model (which is actually BSD 3-clause, where old jpmml was A2L, but either work) On Fri, 1 Jul 2016 at 21:40 Sean Owen wrote: > (The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use > JPMML in Spark and couldn't otherwise

Re: Deploying ML Pipeline Model

2016-07-01 Thread Saurabh Sardeshpande
ssues.apache.org/jira/browse/SPARK-13944 which > was the "first step" in this process). > > On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi Rishabh, >> >> I've just today had similar conversation about how to do a ML Pipeline >

Re: Deploying ML Pipeline Model

2016-07-01 Thread Sean Owen
(The more core JPMML libs are Apache 2; OpenScoring is AGPL. We use JPMML in Spark and couldn't otherwise because the Affero license is not Apache compatible.) On Fri, Jul 1, 2016 at 8:16 PM, Nick Pentreath wrote: > I believe open-scoring is one of the well-known PMML

Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
tioned on this thread is one option. The other option at this > > point is to write your own export functionality and your own serving > layer. > > > > There is (very initial) movement towards improving the local serving > > possibilities (see https://issues.apache.

Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
t; On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi Rishabh, >> >> I've just today had similar conversation about how to do a ML Pipeline >> deployment and couldn't really answer this question and more because I >> don't r

Re: Deploying ML Pipeline Model

2016-07-01 Thread Nick Pentreath
://issues.apache.org/jira/browse/SPARK-13944 which was the "first step" in this process). On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <ja...@japila.pl> wrote: > Hi Rishabh, > > I've just today had similar conversation about how to do a ML Pipeline > deployment and could

Re: Deploying ML Pipeline Model

2016-07-01 Thread Jacek Laskowski
Hi Rishabh, I've just today had similar conversation about how to do a ML Pipeline deployment and couldn't really answer this question and more because I don't really understand the use case. What would you expect from ML Pipeline model deployment? You can save your model to a file

Re: Deploying ML Pipeline Model

2016-07-01 Thread Silvio Fiorito
, Silvio From: Rishabh Bhardwaj <rbnex...@gmail.com> Date: Friday, July 1, 2016 at 7:54 AM To: user <user@spark.apache.org> Cc: "d...@spark.apache.org" <d...@spark.apache.org> Subject: Deploying ML Pipeline Model Hi All, I am looking for ways to deploy a ML Pipeline mod

Re: Deploying ML Pipeline Model

2016-07-01 Thread Steve Goodman
or 7 different model types, although I have not yet used it myself. Looking forward to hearing better suggestions? Steve On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnex...@gmail.com> wrote: > Hi All, > > I am looking for ways to deploy a ML Pipeline model in productio

Deploying ML Pipeline Model

2016-07-01 Thread Rishabh Bhardwaj
Hi All, I am looking for ways to deploy a ML Pipeline model in production . Spark has already proved to be a one of the best framework for model training and creation, but once the ml pipeline model is ready how can I deploy it outside spark context ? MLlib model has toPMML method but today

Clear Threshold in Logistic Regression ML Pipeline

2016-05-03 Thread Abhishek Anand
Hi All, I am trying to build a logistic regression pipeline in ML. How can I clear the threshold which by default is 0.5. In mllib I am able to clear the threshold to get the raw predictions using model.clearThreshold() function. Regards, Abhi

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
; Follow me at https://twitter.com/jaceklaskowski >> >> >> On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter <thelabd...@gmail.com> wrote: >>> I'm seeing the following error when trying to generate a prediction >>> from a very simple ML pipeline based model. I

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
w me at https://twitter.com/jaceklaskowski > > > On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter <thelabd...@gmail.com> wrote: >> I'm seeing the following error when trying to generate a prediction >> from a very simple ML pipeline based model. I've verified that the

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-28 Thread Jacek Laskowski
://twitter.com/jaceklaskowski On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter <thelabd...@gmail.com> wrote: > I'm seeing the following error when trying to generate a prediction > from a very simple ML pipeline based model. I've verified that the raw > data sent to the tokenizer is

Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-28 Thread Timothy Potter
I'm seeing the following error when trying to generate a prediction from a very simple ML pipeline based model. I've verified that the raw data sent to the tokenizer is valid (not null). It seems like this is some sort of weird classpath or class loading type issue. Any help you can provide

Trying to serialize/deserialize Spark ML Pipeline (RandomForest) Spark 1.6

2016-03-13 Thread Mario Lazaro
Hi! I have a pipelineModel (use RandomForestClassifier) that I am trying to save locally. I can save it using: //save locally val fileOut = new FileOutputStream("file:///home/user/forest.model") val out = new ObjectOutputStream(fileOut) out.writeObject(model) out.close() fileOut.close() Then

Re: Logistic Regression using ML Pipeline

2016-02-19 Thread Ajinkya Kale
Please take a look at the example here http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline On Thu, Feb 18, 2016 at 9:27 PM Arunkumar Pillai <arunkumar1...@gmail.com> wrote: > Hi > > I'm trying to build logistic regression using ML Pipeline > > val lr = n

Logistic Regression using ML Pipeline

2016-02-18 Thread Arunkumar Pillai
Hi I'm trying to build logistic regression using ML Pipeline val lr = new LogisticRegression() lr.setFitIntercept(true) lr.setMaxIter(100) val model = lr.fit(data) println(model.summary) I'm getting coefficients but not able to get the predicted and probability values

Re: AIC in Linear Regression in ml pipeline

2016-01-15 Thread Yanbo Liang
value in Linear Regression using ml pipeline ? > Is so please help me > > -- > Thanks and Regards > Arun >

AIC in Linear Regression in ml pipeline

2016-01-15 Thread Arunkumar Pillai
Hi Is it possible to get AIC value in Linear Regression using ml pipeline ? Is so please help me -- Thanks and Regards Arun

Re: LogisticsRegression in ML pipeline help page

2016-01-06 Thread Wen Pei Yu
<arunkumar1...@gmail.com> To: user@spark.apache.org Date: 01/07/2016 12:54 PM Subject:LogisticsRegression in ML pipeline help page Hi I need help page for Logistics Regression in ML pipeline. when i browsed I'm getting the 1.6 help please help me. -- Thanks and Regards         Arun

LogisticsRegression in ML pipeline help page

2016-01-06 Thread Arunkumar Pillai
Hi I need help page for Logistics Regression in ML pipeline. when i browsed I'm getting the 1.6 help please help me. -- Thanks and Regards Arun

Re: GLM I'm ml pipeline

2016-01-03 Thread Yanbo Liang
AFAIK, Spark MLlib will improve and support most GLM functions in the next release(Spark 2.0). 2016-01-03 23:02 GMT+08:00 : > keyStoneML could be an alternative. > > Ardo. > > On 03 Jan 2016, at 15:50, Arunkumar Pillai > wrote: > > Is there any road

Re: GLM I'm ml pipeline

2016-01-03 Thread Arunkumar Pillai
Thanks so eagerly waiting for next Spark release On Mon, Jan 4, 2016 at 7:36 AM, Yanbo Liang wrote: > AFAIK, Spark MLlib will improve and support most GLM functions in the next > release(Spark 2.0). > > 2016-01-03 23:02 GMT+08:00 : > >> keyStoneML could be

GLM I'm ml pipeline

2016-01-03 Thread Arunkumar Pillai
Is there any road map for glm in pipeline?

Re: GLM I'm ml pipeline

2016-01-03 Thread ndjido
keyStoneML could be an alternative. Ardo. > On 03 Jan 2016, at 15:50, Arunkumar Pillai wrote: > > Is there any road map for glm in pipeline?

No documentation for how to write custom Transformer in ml pipeline ?

2015-11-30 Thread Jeff Zhang
Although writing a custom UnaryTransformer is not difficult, but writing a non-UnaryTransformer is a little tricky (have to check the source code). And I don't find any document about how to write custom Transformer in ml pipeline, but writing custom Transformer is a very basic requirement

pyspark ML pipeline with shared data

2015-11-17 Thread Dominik Dahlem
']) \ .alias(self.getOutputCol())) # setting up the ML pipeline rowNormaliser = Normaliser(inputCol='rating', outputCol='rowNorm') als = ALS(userCol='userID', itemCol='movieID', ratingCol='rowNorm') rowDeNormaliser = DeNormaliser(inputCol='prediction', outputCol='denormPrediction

ML Pipeline

2015-09-28 Thread Yasemin Kaya
Hi, I am using Spar 1.5 and ML Pipeline. I create the model then give the model unlabeled data to find the probabilites and predictions. When I want to see the results, it returns me error. //creating model final PipelineModel model = pipeline.fit(trainingData); JavaRDD rowRDD1 = unlabeledTest

Re: spark 1.5, ML Pipeline Decision Tree Dataframe Problem

2015-09-18 Thread Yasemin Kaya
ort sqlContext.implicits._" and then call > "rdd.toDf()" on your RDD to convert it into a dataframe. > > On Fri, Sep 18, 2015 at 7:32 AM, Yasemin Kaya <godo...@gmail.com> wrote: > >> Hi, >> >> I am using *spark 1.5, ML Pipeline Decision Tree >> <ht

spark 1.5, ML Pipeline Decision Tree Dataframe Problem

2015-09-18 Thread Yasemin Kaya
Hi, I am using *spark 1.5, ML Pipeline Decision Tree <http://spark.apache.org/docs/latest/ml-decision-tree.html#output-columns>* to get tree's probability. But I have to convert my data to Dataframe type. While creating model there is no problem but when I am using model on m

Re: Caching intermediate results in Spark ML pipeline?

2015-09-18 Thread Jingchu Liu
Tue, Sep 15, 2015 at 10:26 PM, Jingchu Liu <liujing...@gmail.com> > wrote: > >> Yeah I understand on the low-level we should do as you said. >> >> But since ML pipeline is a high-level API, it is pretty natural to expect >> the ability to recognize overlapping pa

Re: spark 1.5, ML Pipeline Decision Tree Dataframe Problem

2015-09-18 Thread Feynman Liang
n call "rdd.toDf()" on your RDD to convert it into a dataframe. On Fri, Sep 18, 2015 at 7:32 AM, Yasemin Kaya <godo...@gmail.com> wrote: > Hi, > > I am using *spark 1.5, ML Pipeline Decision Tree > <http://spark.apache.org/docs/latest/ml-decision-tree.html#output-co

Re: Caching intermediate results in Spark ML pipeline?

2015-09-15 Thread Feynman Liang
icular use case for caching intermediate results and if the current API doesn't support it we can create a JIRA for it. On Tue, Sep 15, 2015 at 10:26 PM, Jingchu Liu <liujing...@gmail.com> wrote: > Yeah I understand on the low-level we should do as you said. > > But since ML pipelin

Re: Caching intermediate results in Spark ML pipeline?

2015-09-15 Thread Jingchu Liu
Yeah I understand on the low-level we should do as you said. But since ML pipeline is a high-level API, it is pretty natural to expect the ability to recognize overlapping parameters between successive runs. (Actually, this happen A LOT when we have lots of hyper-params to search for) I can also

Re: Caching intermediate results in Spark ML pipeline?

2015-09-15 Thread Feynman Liang
t;>>> >>>> Many pipeline stages implement save/load methods, which can be used if >>>> you instantiate and call the underlying pipeline stages `transform` methods >>>> individually (instead of using the Pipeline.setStages API). See associated >&

Caching intermediate results in Spark ML pipeline?

2015-09-14 Thread Jingchu Liu
Hi all, I have a question regarding the ability of ML pipeline to cache intermediate results. I've posted this question on stackoverflow <http://stackoverflow.com/questions/32561687/caching-intermediate-results-in-spark-ml-pipeline> but got no answer, hope someone here can help

Re: Caching intermediate results in Spark ML pipeline?

2015-09-14 Thread Feynman Liang
peline persistence is on the 1.6 roadmap, JIRA here <https://issues.apache.org/jira/browse/SPARK-6725>. Feynman On Mon, Sep 14, 2015 at 9:20 PM, Jingchu Liu <liujing...@gmail.com> wrote: > Hi all, > > I have a question regarding the ability of ML pipeline to cache > int

Re: Caching intermediate results in Spark ML pipeline?

2015-09-14 Thread Feynman Liang
n the 1.6 roadmap, JIRA here >> <https://issues.apache.org/jira/browse/SPARK-6725>. >> >> Feynman >> >> On Mon, Sep 14, 2015 at 9:20 PM, Jingchu Liu <liujing...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I have a qu

Re: Caching intermediate results in Spark ML pipeline?

2015-09-14 Thread Jingchu Liu
<liujing...@gmail.com> wrote: > >> Hi all, >> >> I have a question regarding the ability of ML pipeline to cache >> intermediate results. I've posted this question on stackoverflow >> <http://stackoverflow.com/questions/32561687/caching-intermediate-results-i

Re: Random Forest and StringIndexer in pyspark ML Pipeline

2015-08-21 Thread Yanbo Liang
can use IndexToString. 2015-08-11 6:56 GMT+08:00 pkphlam pkph...@gmail.com: Hi, If I understand the RandomForest model in the ML Pipeline implementation in the ml package correctly, I have to first run my outcome label variable through the StringIndexer, even if my labels are numeric

Random Forest and StringIndexer in pyspark ML Pipeline

2015-08-10 Thread pkphlam
Hi, If I understand the RandomForest model in the ML Pipeline implementation in the ml package correctly, I have to first run my outcome label variable through the StringIndexer, even if my labels are numeric. The StringIndexer then converts the labels into numeric indices based on frequency

Re: How to implement an Evaluator for a ML pipeline?

2015-05-20 Thread Stefan H.
: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: How to implement an Evaluator for a ML pipeline?

2015-05-19 Thread Xiangrui Meng
. Cheers, Stefan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html Sent from the Apache Spark User List mailing list archive at Nabble.com

How to implement an Evaluator for a ML pipeline?

2015-05-09 Thread Stefan H.
with false assumptions. I'd be grateful if someone could point me to some documentation or examples, or has a few hints to share. Cheers, Stefan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html Sent from

Re: [Ml][Dataframe] Ml pipeline dataframe repartitioning

2015-04-26 Thread Joseph Bradley
Hi Peter, As far as setting the parallelism, I would recommend setting it as early as possible. Ideally, that would mean specifying the number of partitions when loading the initial data (rather than repartitioning later on). In general, working with Vector columns should be better since the

[Ml][Dataframe] Ml pipeline dataframe repartitioning

2015-04-24 Thread Peter Rudenko
Hi i have a next problem. I have a dataset with 30 columns (15 numeric, 15 categorical) and using ml transformers/estimators to transform each column (StringIndexer for categorical MeanImputor for numeric). This creates 30 more columns in a dataframe. After i’m using VectorAssembler to

Re: Spark ML Pipeline inaccessible types

2015-03-27 Thread Xiangrui Meng
petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything

Re: Spark ML Pipeline inaccessible types

2015-03-27 Thread Joseph Bradley
...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your logic into org.apache.spark package in your project - then everything would be accessible. 2) Dirty

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread Peter Rudenko
to get parameter by name using /val m = this.getClass.getMethodName(paramName)./ This may be a bug, but it is only a side effect caused by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
, Martin -- Původní zpráva -- Od: Peter Rudenko petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od

Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
Hi, I have started implementing a machine learning pipeline using Spark 1.3.0 and the new pipelining API and DataFrames. I got to a point where I have my training data set prepared using a sequence of Transformers, but I am struggling to actually train a model and use it for predictions.

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread Sean Owen
NoSuchMethodError in general means that your runtime and compile-time environments are different. I think you need to first make sure you don't have mismatching versions of Spark. On Wed, Mar 25, 2015 at 11:00 AM, zapletal-mar...@email.cz wrote: Hi, I have started implementing a machine

ML Pipeline question about caching

2015-03-17 Thread Cesar Flores
Hello all: I am using the ML Pipeline, which I consider very powerful. I have the next use case: - I have three transformers, which I will call A,B,C, that basically extract features from text files, with no parameters. - I have a final stage D, which is the logistic regression

Re: ML Pipeline question about caching

2015-03-17 Thread Peter Rudenko
of combinations (num parameters for transformer /number parameters for estimator / number of folds). Thanks, Peter Rudenko On 2015-03-18 00:26, Cesar Flores wrote: Hello all: I am using the ML Pipeline, which I consider very powerful. I have the next use case: * I have three transformers, which I

Re: Need some help to create user defined type for ML pipeline

2015-02-23 Thread Jaonary Rabarisoa
Hi Joseph, Thank you for you feedback. I've managed to define an image type by following VectorUDT implementation. I have another question about the definition of a user defined transformer. The unary tranfromer is private to spark ml. Do you plan to give a developer api for transformers ? On

Re: Need some help to create user defined type for ML pipeline

2015-02-23 Thread Xiangrui Meng
Yes, we are going to expose the developer API. There was a long discussion in the PR: https://github.com/apache/spark/pull/3637. So we marked them package private and look for feedback on how to improve it. Please implement your classes under `spark.ml` for now and let us know your feedback.

Re: Spark ML pipeline

2015-02-11 Thread Reynold Xin
Yes. Next release (Spark 1.3) is coming out end of Feb / early Mar. On Wed, Feb 11, 2015 at 7:22 AM, Jianguo Li flyingfromch...@gmail.com wrote: Hi, I really like the pipeline in the spark.ml in Spark1.2 release. Will there be more machine learning algorithms implemented for the pipeline

Re: Need some help to create user defined type for ML pipeline

2015-01-24 Thread Joseph Bradley
Hi Jao, You're right that defining serialize and deserialize is the main task in implementing a UDT. They are basically translating between your native representation (ByteImage) and SQL DataTypes. The sqlType you defined looks correct, and you're correct to use a row of length 4. Other than

Need some help to create user defined type for ML pipeline

2015-01-19 Thread Jaonary Rabarisoa
Hi all, I'm trying to implement a pipeline for computer vision based on the latest ML package in spark. The first step of my pipeline is to decode image (jpeg for instance) stored in a parquet file. For this, I begin to create a UserDefinedType that represents a decoded image stored in a array of