[Spark Namespace]: Expanding Spark ML under Different Namespace?

Shouheng Yi Wed, 22 Feb 2017 12:52:05 -0800

Hi Spark developers,

Currently my team at Microsoft is extending Spark's machine learning 
functionalities to include new learners and transformers. We would like users 
to use these within spark pipelines so that they can mix and match with 
existing Spark learners/transformers, and overall have a native spark 
experience. We cannot accomplish this using a non-"org.apache" namespace with 
the current implementation, and we don't want to release code inside the apache 
namespace because it's confusing and there could be naming rights issues.


We need to extend several classes from spark which happen to have 
"private[spark]." For example, one of our class extends VectorUDT[0] which has 
private[spark] class VectorUDT as its access modifier. This unfortunately put 
us in a strange scenario that forces us to work under the namespace 
org.apache.spark.

To be specific, currently the private classes/traits we need to use to create 
new Spark learners & Transformers are HasInputCol, VectorUDT and Logging. We 
will expand this list as we develop more.

Is there a way to avoid this namespace issue? What do other people/companies do 
in this scenario? Thank you for your help!

[0]: 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala

Best,
Shouheng

[Spark Namespace]: Expanding Spark ML under Different Namespace?

Reply via email to