Hi, I've hit a wall with trying to just implement a couple of Scala methods of in a Python version of our project.
My Python function looks like this: def Write_Graphml(data, graphml_path, sc): return sc.getOrCreate()._jvm.io.archivesunleashed.app.WriteGraphML(data, graphml_path).apply Where data is a DataFrame that has been collected; data.collect(). On the Scala side is it basically: object WriteGraphML { apply(data: Array[Row], graphmlPath: String): Boolean = { ... massages an Array[Row] into GraphML ... True } When I try to use it in PySpark, I end up getting this error message: Py4JError: An error occurred while calling None.io.archivesunleashed.app.WriteGraphML. Trace: py4j.Py4JException: Constructor io.archivesunleashed.app.WriteGraphML([class java.util.ArrayList, class java.lang.String]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196) at py4j.Gateway.invoke(Gateway.java:237) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Based on my research, I'm fairly certain it is because of how Py4J is passing off the Python List (data) to the JVM, and then passing it to Scala. It's ending up as an ArrayList instead of an Array[Row]. Do I need to tweak data before it is passed to Write_Graphml? Or am I doing something else wrong here. ...and not 100% sure if this is a user or dev list question. Let me know if I should move this over to user. Thanks in advance for any help! cheers! -nruest
signature.asc
Description: OpenPGP digital signature