Hi Spark developers, Currently my team at Microsoft is extending Spark's machine learning functionalities to include new learners and transformers. We would like users to use these within spark pipelines so that they can mix and match with existing Spark learners/transformers, and overall have a native spark experience. We cannot accomplish this using a non-"org.apache" namespace with the current implementation, and we don't want to release code inside the apache namespace because it's confusing and there could be naming rights issues.
We need to extend several classes from spark which happen to have "private[spark]." For example, one of our class extends VectorUDT[0] which has private[spark] class VectorUDT as its access modifier. This unfortunately put us in a strange scenario that forces us to work under the namespace org.apache.spark. To be specific, currently the private classes/traits we need to use to create new Spark learners & Transformers are HasInputCol, VectorUDT and Logging. We will expand this list as we develop more. Is there a way to avoid this namespace issue? What do other people/companies do in this scenario? Thank you for your help! [0]: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala Best, Shouheng