[
https://issues.apache.org/jira/browse/THRIFT-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chandler May updated THRIFT-4042:
---------------------------------
Description:
We recently switched to thrift 0.10.0 with accelerated protocols and started
getting sporadic errors in tests that use the multiprocessing module of the
form:
{code}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python2.7/multiprocessing/pool.py:250: in map
return self.map_async(func, iterable, chunksize).get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None
def get(self, timeout=None):
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
> raise self._value
E ExtractionError: Can't extract file(s) to egg cache
E
E The following error occurred while trying to extract file(s) to the
Python egg
E cache:
E
E [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
E
E The Python egg cache directory is currently set to:
E
E /home/concrete/.cache/Python-Eggs
E
E Perhaps your account does not have write access to this directory?
You can
E change the cache directory by setting the PYTHON_EGG_CACHE
environment
E variable to point to an accessible directory.
/usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
{code}
This particular error arose from a test we wrote to isolate the issue. It is
of the form:
{code}
from multiprocessing import Pool
input_path = '/path/to/thrift_serialized_data'
num_trials = 100
num_procs = 2
num_tasks = 4
for i in xrange(num_trials):
pool = Pool(num_procs)
results = pool.map(_deserialize, [input_path] * num_tasks)
for result in results:
assert result is True
{code}
where {{_deserialize}} is a function that reads thrift serialized objects from
a file and returns {{True}} on success. I can provide MWE if necessary but it
would take some time on my part.
I want to stress that this only happens when using the new accelerated protocol
in thrift 0.10.0 and only happens in {{python setup.py test}} in our project
when thrift has not been installed via *pip* (but has been installed by
{{python setup.py install}} in our project, which depends on thrift). We are
using pytest but I'm not sure whether that's important. At test time thrift
gets installed/unpacked as an egg in a local directory and gets a locking
error. I believe this is the same error as:
http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html
http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html
I believe the documentation indicates this problem can be worked around by
setting {{zip_safe}} to {{False}} in {{setup.py}}:
http://setuptools.readthedocs.io/en/latest/setuptools.html
was:
We recently switched to thrift 0.10.0 with accelerated protocols and started
getting sporadic errors in tests that use the multiprocessing module of the
form:
{code}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python2.7/multiprocessing/pool.py:250: in map
return self.map_async(func, iterable, chunksize).get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None
def get(self, timeout=None):
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
> raise self._value
E ExtractionError: Can't extract file(s) to egg cache
E
E The following error occurred while trying to extract file(s) to the
Python egg
E cache:
E
E [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
E
E The Python egg cache directory is currently set to:
E
E /home/concrete/.cache/Python-Eggs
E
E Perhaps your account does not have write access to this directory?
You can
E change the cache directory by setting the PYTHON_EGG_CACHE
environment
E variable to point to an accessible directory.
/usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
{code}
This particular error arose from a test we wrote to isolate the issue. It is
of the form:
{code}
from multiprocessing import Pool
input_path = '/path/to/thrift_serialized_data'
num_trials = 100
num_procs = 2
num_tasks = 4
for i in xrange(num_trials):
pool = Pool(num_procs)
results = pool.map(_deserialize, [input_path] * num_tasks)
for result in results:
assert result is True
{code}
where {{_deserialize}} is a function that reads thrift serialized objects from
a file and returns {{True}} on success. I can provide MWE if necessary but it
would take some time on my part.
I want to stress that this only happens when using the new accelerated protocol
in thrift 0.10.0 and only happens in {{python setup.py test}} in our project
when thrift is *not* installed already on the system. We are using pytest but
I'm not sure whether that's important. At test time thrift gets
installed/unpacked as an egg in a local directory and gets a locking error. I
believe this is the same error as:
http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html
http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html
I believe the documentation indicates this problem can be worked around by
setting {{zip_safe}} to {{False}} in {{setup.py}}:
http://setuptools.readthedocs.io/en/latest/setuptools.html
> ExtractionError when using accelerated thrift in a multiprocess test
> --------------------------------------------------------------------
>
> Key: THRIFT-4042
> URL: https://issues.apache.org/jira/browse/THRIFT-4042
> Project: Thrift
> Issue Type: Bug
> Components: Python - Library
> Affects Versions: 0.10.0
> Reporter: Chandler May
>
> We recently switched to thrift 0.10.0 with accelerated protocols and started
> getting sporadic errors in tests that use the multiprocessing module of the
> form:
> {code}
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> /usr/lib64/python2.7/multiprocessing/pool.py:250: in map
> return self.map_async(func, iterable, chunksize).get()
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None
> def get(self, timeout=None):
> self.wait(timeout)
> if not self._ready:
> raise TimeoutError
> if self._success:
> return self._value
> else:
> > raise self._value
> E ExtractionError: Can't extract file(s) to egg cache
> E
> E The following error occurred while trying to extract file(s) to
> the Python egg
> E cache:
> E
> E [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
> E
> E The Python egg cache directory is currently set to:
> E
> E /home/concrete/.cache/Python-Eggs
> E
> E Perhaps your account does not have write access to this
> directory? You can
> E change the cache directory by setting the PYTHON_EGG_CACHE
> environment
> E variable to point to an accessible directory.
> /usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
> {code}
> This particular error arose from a test we wrote to isolate the issue. It is
> of the form:
> {code}
> from multiprocessing import Pool
> input_path = '/path/to/thrift_serialized_data'
>
> num_trials = 100
>
> num_procs = 2
>
> num_tasks = 4
>
>
>
> for i in xrange(num_trials):
>
> pool = Pool(num_procs)
>
> results = pool.map(_deserialize, [input_path] * num_tasks)
> for result in results:
> assert result is True
> {code}
> where {{_deserialize}} is a function that reads thrift serialized objects
> from a file and returns {{True}} on success. I can provide MWE if necessary
> but it would take some time on my part.
> I want to stress that this only happens when using the new accelerated
> protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our
> project when thrift has not been installed via *pip* (but has been installed
> by {{python setup.py install}} in our project, which depends on thrift). We
> are using pytest but I'm not sure whether that's important. At test time
> thrift gets installed/unpacked as an egg in a local directory and gets a
> locking error. I believe this is the same error as:
> http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html
> http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html
> I believe the documentation indicates this problem can be worked around by
> setting {{zip_safe}} to {{False}} in {{setup.py}}:
> http://setuptools.readthedocs.io/en/latest/setuptools.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)