[ 
https://issues.apache.org/jira/browse/SYSTEMML-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner updated SYSTEMML-1370:
------------------------------------
    Fix Version/s:     (was: SystemML 1.0)
                   SystemML 0.14

> Py4JError: An error occurred while calling 
> z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SYSTEMML-1370
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1370
>             Project: SystemML
>          Issue Type: Bug
>          Components: APIs
>    Affects Versions: Not Applicable
>         Environment: pyspark with local Spark 2.1
>            Reporter: Berthold Reinwald
>             Fix For: SystemML 0.14
>
>
> Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?
> Below simple script works for 23100 rows, while 46900 fails. This is how to 
> easily and consistently reproduce.
> START:
> $pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G 
> --executor-memory 2G
> PYTHON SCRIPT:
> from systemml import MLContext, dml
> import pandas as pd
> sc.version
> ml = MLContext(sc)
> print "Spark Version:", sc.version
> print "SystemML Version:", ml.version()
> print "SystemML Built-Time:", ml.buildTime()
> # !! number of rows 23100 works, while 46900 fails
> nr = 46900
> X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)
> script ="""
>     write(X, $Xfile, format="csv")
> """
> prog = dml(script).input(X=X_pd).input(**{"$Xfile":"/tmp/X_pd.csv"})
> ml.execute(prog)
> OUTPUT:
> Spark Version: 2.1.0
> SystemML Version: 0.14.0-incubating-SNAPSHOT
> SystemML Built-Time: 2017-03-03 07:33:40 UTC
> ---------------------------------------------------------------------------
> Py4JError                                 Traceback (most recent call last)
> .......
> Py4JError: An error occurred while calling 
> z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
>  Trace:
> java.lang.NegativeArraySizeException
>       at py4j.Base64.decode(Base64.java:321)
>       at py4j.Protocol.getBytes(Protocol.java:173)
>       at py4j.Protocol.getObject(Protocol.java:294)
>       at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
>       at py4j.commands.CallCommand.execute(CallCommand.java:77)
>       at py4j.GatewayConnection.run(GatewayConnection.java:214)
>       at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to