Please file a bug here: https://issues.apache.org/jira/browse/SPARK/
Could you also provide a way to reproduce this bug (including some datasets)? On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga <sammiest...@gmail.com> wrote: > I've changed the SIFT feature extraction to SURF feature extraction and it > works... > > Following line was changed: > sift = cv2.xfeatures2d.SIFT_create() > > to > > sift = cv2.xfeatures2d.SURF_create() > > Where should I file this as a bug? When not running on Spark it works fine > so I'm saying it's a spark bug. > > On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga <sammiest...@gmail.com> wrote: >> >> Yea should have emphasized that. I'm running the same code on the same VM. >> It's a VM with spark in standalone mode and I run the unit test directly on >> that same VM. So OpenCV is working correctly on that same machine but when >> moving the exact same OpenCV code to spark it just crashes. >> >> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu <dav...@databricks.com> wrote: >>> >>> Could you run the single thread version in worker machine to make sure >>> that OpenCV is installed and configured correctly? >>> >>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga <sammiest...@gmail.com> >>> wrote: >>> > I've verified the issue lies within Spark running OpenCV code and not >>> > within >>> > the sequence file BytesWritable formatting. >>> > >>> > This is the code which can reproduce that spark is causing the failure >>> > by >>> > not using the sequencefile as input at all but running the same >>> > function >>> > with same input on spark but fails: >>> > >>> > def extract_sift_features_opencv(imgfile_imgbytes): >>> > imgfilename, discardsequencefile = imgfile_imgbytes >>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read()) >>> > nparr = np.fromstring(buffer(imgbytes), np.uint8) >>> > img = cv2.imdecode(nparr, 1) >>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) >>> > sift = cv2.xfeatures2d.SIFT_create() >>> > kp, descriptors = sift.detectAndCompute(gray, None) >>> > return (imgfilename, "test") >>> > >>> > And corresponding tests.py: >>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6 >>> > >>> > >>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <sammiest...@gmail.com> >>> > wrote: >>> >> >>> >> Thanks for the advice! The following line causes spark to crash: >>> >> >>> >> kp, descriptors = sift.detectAndCompute(gray, None) >>> >> >>> >> But I do need this line to be executed and the code does not crash >>> >> when >>> >> running outside of Spark but passing the same parameters. You're >>> >> saying >>> >> maybe the bytes from the sequencefile got somehow transformed and >>> >> don't >>> >> represent an image anymore causing OpenCV to crash the whole python >>> >> executor. >>> >> >>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu <dav...@databricks.com> >>> >> wrote: >>> >>> >>> >>> Could you try to comment out some lines in >>> >>> `extract_sift_features_opencv` to find which line cause the crash? >>> >>> >>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a >>> >>> C library in Python (OpenCV). >>> >>> >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga >>> >>> <sammiest...@gmail.com> >>> >>> wrote: >>> >>> > Hi sparkers, >>> >>> > >>> >>> > I am working on a PySpark application which uses the OpenCV >>> >>> > library. It >>> >>> > runs >>> >>> > fine when running the code locally but when I try to run it on >>> >>> > Spark on >>> >>> > the >>> >>> > same Machine it crashes the worker. >>> >>> > >>> >>> > The code can be found here: >>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f >>> >>> > >>> >>> > This is the error message taken from STDERR of the worker log: >>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013 >>> >>> > >>> >>> > Would like pointers or tips on how to debug further? Would be nice >>> >>> > to >>> >>> > know >>> >>> > the reason why the worker crashed. >>> >>> > >>> >>> > Thanks, >>> >>> > Sam Stoelinga >>> >>> > >>> >>> > >>> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly >>> >>> > (crashed) >>> >>> > at >>> >>> > >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172) >>> >>> > at >>> >>> > >>> >>> > >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) >>> >>> > at >>> >>> > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) >>> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> >>> > at >>> >>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64) >>> >>> > at >>> >>> > >>> >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >>> >>> > at >>> >>> > >>> >>> > >>> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> > at >>> >>> > >>> >>> > >>> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> > at java.lang.Thread.run(Thread.java:745) >>> >>> > Caused by: java.io.EOFException >>> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392) >>> >>> > at >>> >>> > >>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108) >>> >>> > >>> >>> > >>> >>> > >>> >> >>> >> >>> > >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org