Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

Nick Pentreath Thu, 23 Feb 2017 00:57:47 -0800

Currently your only option is to write (or copy) your own implementations.

Logging is definitely intended to be internal use only, and it's best to
use your own logging lib - Typesafe scalalogging is a common option that
I've used.


As for the VectorUDT, for now that is private. There are no plans to open
it up as yet. It should not be too difficult to have your own UDT
implementation. What type of extensions are you trying to do with the UDT?

Likewise the shared params are for now private. It is a bit annoying to
have to re-create them, but most of them are pretty simple so it's not a
huge overhead.

Perhaps you can add your thoughts & comments to
https://issues.apache.org/jira/browse/SPARK-19498 in terms of extending
Spark ML. Ultimately I support making it easier to extend. But we do have
to balance that with exposing new public APIs and classes that impose
backward compat guarantees.

Perhaps now is a good time to think about some of the common shared params
for example.

Thanks
Nick


On Wed, 22 Feb 2017 at 22:51 Shouheng Yi <sho...@microsoft.com.invalid>
wrote:

Hi Spark developers,



Currently my team at Microsoft is extending Spark’s machine learning
functionalities to include new learners and transformers. We would like
users to use these within spark pipelines so that they can mix and match
with existing Spark learners/transformers, and overall have a native spark
experience. We cannot accomplish this using a non-“org.apache” namespace
with the current implementation, and we don’t want to release code inside
the apache namespace because it’s confusing and there could be naming
rights issues.



We need to extend several classes from spark which happen to have
“private[spark].” For example, one of our class extends VectorUDT[0] which
has private[spark] class VectorUDT as its access modifier. This
unfortunately put us in a strange scenario that forces us to work under the
namespace org.apache.spark.



To be specific, currently the private classes/traits we need to use to
create new Spark learners & Transformers are HasInputCol, VectorUDT and
Logging. We will expand this list as we develop more.



Is there a way to avoid this namespace issue? What do other
people/companies do in this scenario? Thank you for your help!



[0]:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala



Best,

Shouheng

Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

Reply via email to