[
https://issues.apache.org/jira/browse/KUDU-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866551#comment-15866551
]
Andy Stadtler commented on KUDU-1603:
-------------------------------------
We probably need a python class to wrap calling KuduContext cleaner so you
don't have to do ugly stuff like this.
kc = sc._jvm.org.apache.kudu.spark.kudu.KuduContext("kudu.master:7051")
We also probably need a helper to convert Java ArrayList to a Scala Sequence
for KuduRDD since py4j will convert the python list to an ArrayList. Not a
Scala person but something simple like this works.
import java.util.ArrayList
import scala.collection.JavaConverters._
def ArrayListToSeq(al : ArrayList[String]) = al.asScala.toSeq
> Pyspark Integration
> -------------------
>
> Key: KUDU-1603
> URL: https://issues.apache.org/jira/browse/KUDU-1603
> Project: Kudu
> Issue Type: New Feature
> Components: integration, python, spark
> Reporter: Jordan Birdsell
> Labels: features
>
> Now that integration with the Spark Scala/Java API has occurred, work can
> begin on exposing this to python and integrating with pyspark. This would
> likely be a more desirable interface to Kudu for python for use cases, like
> Data Science, than the current Python client.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)