Chunxi Zhang created SPARK-6931:
-----------------------------------
Summary: python: struct.pack('!q', value) in write_long(value,
stream) in serializers.py require int(but doesn't raise exceptions in common
cases)
Key: SPARK-6931
URL: https://issues.apache.org/jira/browse/SPARK-6931
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.3.0
Reporter: Chunxi Zhang
Priority: Critical
when I map my own feature calculation module's function, sparks raises:
Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py",
line 162, in manager
code = worker(sock)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py",
line 60, in worker
worker_main(infile, outfile)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py",
line 115, in main
report_times(outfile, boot_time, init_time, finish_time)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py",
line 40, in report_times
write_long(1000 * boot, outfile)
File
"/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py",
line 518, in write_long
stream.write(struct.pack("!q", value))
DeprecationWarning: integer argument expected, got float
so I turn on the serializers.py, and tried to print the value out, which is a
float, came from 1000 * time.time()
when I removed my lib, or add a rdd.count() before mapping my lib, this bug
won’t appear.
so I edited the function to :
def write_long(value, stream):
stream.write(struct.pack("!q", int(value))) # added a int(value)
everything seem fine…
According to python’s doc for
struct(https://docs.python.org/2/library/struct.html)’s Note(3), the value
should be a int(for q), and if it’s a float, it’ll try use __index__(), else,
try __int__, but since __int__ is deprecated, it’ll raise DeprecationWarning.
And float doesn’t have __index__, but has __int__, so it should raise the
exception every time.
But, as you can see, in normal cases, it won’t raise the exception, and the
code works perfectly, and exec struct.pack('!q', 111.1) in console or a clean
file won't raise any exception…I can hardly tell how my lib might effect a
time.time()'s value passed to struct.pack()... it might a python's original bug
or what.
Anyway, this value should be a int, so add a int() to it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]