Repository: spark
Updated Branches:
  refs/heads/branch-2.2 342cc2a4c -> 49968de52


Fixed pandoc dependency issue in python/setup.py

## Problem Description

When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.

The following is a sample failure:
```
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
  Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
    100% 
|████████████████████████████████|
 188.3MB 16.8MB/s
    Complete output from command python setup.py egg_info:
    Maybe try:

        sudo apt-get install pandoc
    See http://johnmacfarlane.net/pandoc/installing.html
    for installation options
    ---------------------------------------------------------------

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
        long_description = pypandoc.convert('README.md', 'rst')
      File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 69, in convert
        outputfile=outputfile, filters=filters)
      File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 260, in _convert_input
        _ensure_pandoc_path()
      File 
"/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py",
 line 544, in _ensure_pandoc_path
        raise OSError("No pandoc was found: either install pandoc and add it\n"
    OSError: No pandoc was found: either install pandoc and add it
    to your PATH or or call pypandoc.download_pandoc(...) or
    install pypandoc wheels with included pandoc.

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-build-mfnizcwa/pyspark/
```

## What changes were proposed in this pull request?

This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without 
requiring pandoc to be installed.

## How was this patch tested?

I tested this by building a wheel package of pyspark with the change applied. 
Then, in a clean virtual environment with pypandoc installed but pandoc not 
available on the system, I installed pyspark from the wheel.

Here is the output

```
$ pip freeze | grep pypandoc
pypandoc==1.4
$ which pandoc
$ pip install --no-cache-dir 
../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Processing 
/home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Requirement already satisfied: py4j==0.10.6 in 
/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from 
pyspark==2.3.0.dev0)
Installing collected packages: pyspark
Successfully installed pyspark-2.3.0.dev0
```

Author: Tucker Beck <tucker.b...@rentrakmail.com>

Closes #18981 from 
dusktreader/dusktreader/fix-pandoc-dependency-issue-in-setup_py.

(cherry picked from commit aad2125475dcdeb4a0410392b6706511db17bac4)
Signed-off-by: hyukjinkwon <gurwls...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/49968de5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/49968de5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/49968de5

Branch: refs/heads/branch-2.2
Commit: 49968de526e76a75abafb636cbd5ed84f9a496e9
Parents: 342cc2a
Author: Tucker Beck <tucker.b...@rentrakmail.com>
Authored: Thu Sep 7 09:38:00 2017 +0900
Committer: hyukjinkwon <gurwls...@gmail.com>
Committed: Thu Sep 7 09:38:21 2017 +0900

----------------------------------------------------------------------
 python/setup.py | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/49968de5/python/setup.py
----------------------------------------------------------------------
diff --git a/python/setup.py b/python/setup.py
index f500354..7e63461 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -151,6 +151,8 @@ try:
         long_description = pypandoc.convert('README.md', 'rst')
     except ImportError:
         print("Could not import pypandoc - required to package PySpark", 
file=sys.stderr)
+    except OSError:
+        print("Could not convert - pandoc is not installed", file=sys.stderr)
 
     setup(
         name='pyspark',


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to