[
https://issues.apache.org/jira/browse/SPARK-19498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553942#comment-16553942
]
Paul Vanhaesebrouck commented on SPARK-19498:
---------------------------------------------
I have been working on an extension to the spark-ml package in Scala, here is
my feedback.
* Since I tried to follow the same standards as spark-ml, I had to make all of
my custom estimators and transformers belong to the package
org.apache.spark.ml. This includes reusing the existing shared params traits
(they are private), also stuff like SchemaUtils was not available outside of
spark, etc...
* I have created a few meta algorithms, that is, stages that have other stages
as parameters (like crossvalidator). The biggest difficulty I am facing right
now comes from *ml.util.MetaAlgorithmReadWrite*. This object prevents me from
having my own meta algorithm inside of the already existing CrossValidator and
to save it. Right now the only alternative I have is to rewrite the
CrossValidation myself in order to avoid this.
* Some of my meta algorithm would greatly benefit from having access to the
functions that generate the output columns. For example, at some point I had to
find a workaround in order to access the function predict from
ml.PredictionModel. This last point is related to SPARK-10413.
I have not contributed to Spark yet but please don't hesitate to reach out if
you want me to participate on this.
> Discussion: Making MLlib APIs extensible for 3rd party libraries
> ----------------------------------------------------------------
>
> Key: SPARK-19498
> URL: https://issues.apache.org/jira/browse/SPARK-19498
> Project: Spark
> Issue Type: Brainstorming
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Joseph K. Bradley
> Priority: Critical
>
> Per the recent discussion on the dev list, this JIRA is for discussing how we
> can make MLlib DataFrame-based APIs more extensible, especially for the
> purpose of writing 3rd-party libraries with APIs extended from the MLlib APIs
> (for custom Transformers, Estimators, etc.).
> * For people who have written such libraries, what issues have you run into?
> * What APIs are not public or extensible enough? Do they require changes
> before being made more public?
> * Are APIs for non-Scala languages such as Java and Python friendly or
> extensive enough?
> The easy answer is to make everything public, but that would be terrible of
> course in the long-term. Let's discuss what is needed and how we can present
> stable, sufficient, and easy-to-use APIs for 3rd-party developers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]