This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new cb1aad6  [SPARK-26559][ML][PYSPARK] ML image can't work with numpy 
versions prior to 1.9
cb1aad6 is described below

commit cb1aad69b781bf9612b9b14f5338b338344365f4
Author: Liang-Chi Hsieh <vii...@gmail.com>
AuthorDate: Mon Jan 7 18:36:52 2019 +0800

    [SPARK-26559][ML][PYSPARK] ML image can't work with numpy versions prior to 
1.9
    
    ## What changes were proposed in this pull request?
    
    Due to [API 
change](https://github.com/numpy/numpy/pull/4257/files#diff-c39521d89f7e61d6c0c445d93b62f7dc)
 at 1.9, PySpark image doesn't work with numpy version prior to 1.9.
    
    When running image test with numpy version prior to 1.9, we can see error:
    ```
    test_read_images (pyspark.ml.tests.test_image.ImageReaderTest) ... ERROR
    test_read_images_multiple_times 
(pyspark.ml.tests.test_image.ImageReaderTest2) ... ok
    
    ======================================================================
    ERROR: test_read_images (pyspark.ml.tests.test_image.ImageReaderTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File 
"/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/tests/test_image.py", 
line 36, in test_read_images
        self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), 
first_row)
      File "/Users/viirya/docker_tmp/repos/spark-1/python/pyspark/ml/image.py", 
line 193, in toImage
        data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
    AttributeError: 'numpy.ndarray' object has no attribute 'tobytes'
    
    ----------------------------------------------------------------------
    Ran 2 tests in 29.040s
    
    FAILED (errors=1)
    ```
    
    ## How was this patch tested?
    
    Manually test with numpy version prior and after 1.9.
    
    Closes #23484 from viirya/fix-pyspark-image.
    
    Authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
    
    (cherry picked from commit a927c764c1eee066efc1c2c713dfee411de79245)
    
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/ml/image.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/image.py b/python/pyspark/ml/image.py
index edb90a3..a1aacea 100644
--- a/python/pyspark/ml/image.py
+++ b/python/pyspark/ml/image.py
@@ -28,6 +28,7 @@ import sys
 import warnings
 
 import numpy as np
+from distutils.version import LooseVersion
 
 from pyspark import SparkContext
 from pyspark.sql.types import Row, _create_row, _parse_datatype_json_string
@@ -190,7 +191,11 @@ class _ImageSchema(object):
         # Running `bytearray(numpy.array([1]))` fails in specific Python 
versions
         # with a specific Numpy version, for example in Python 3.6.0 and NumPy 
1.13.3.
         # Here, it avoids it by converting it to bytes.
-        data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+        if LooseVersion(np.__version__) >= LooseVersion('1.9'):
+            data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
+        else:
+            # Numpy prior to 1.9 don't have `tobytes` method.
+            data = bytearray(array.astype(dtype=np.uint8).ravel())
 
         # Creating new Row with _create_row(), because Row(name = value, ... )
         # orders fields by name, which conflicts with expected schema order


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to