Could you try latest git master beignet? We have some major performance improvement for some cases. Not sure whether it is the case for JohnTheRipper, but worth to give a try.
Thanks, Zhigang Gong. On Sat, Oct 25, 2014 at 2:46 PM, Oleksii Shevchuk <[email protected]> wrote: > Zhigang Gong <[email protected]> writes: > >> This should be an application bug, according to OpenCL 1.2 spec: >> >> CL_PROFILING_INFO_NOT_AVAILABLE if the CL_QUEUE_PROFILING_ENABLE >> flag is not set for the command-queue, if the execution status of >> the command identified >> by event is not CL_COMPLETE or if event is a user event object. >> >> To make sure an event's state to be CL_COMPLETE, you need to call >> clWaitForEvents() >> rather than clFinish(). >> >> According to spec, clFinish() is used to : >> blocks until all previously queued OpenCL commands in command_queue are >> issued >> to the associated device and have completed. >> >> It is not to update all the related event's state. And it is too >> heavy, as it will wait for the command >> to be completed. The event's CL_COMPLETE state means the command has >> been flushed into >> the GPU's command buffer and may haven't completed. It's used to do >> GPU command queue >> side synchronization. clFinish() is to synchronize with host CPU. >> >> I would recommend you to call clWaitForEvents before you call the >> clGetEventProfilingInfo(). >> If you still met problems with that change, please let us know. >> > > Thanks, It's works. Slow, but works. Maybe this is the problem with > their implementation. > > Btw, i call clWaitForEvents for 1 event in list every time before > calling clGetEventProfilingInfo on that event. Is it ok, or should I > call it for a whole event list? > > Also, I use i915 driver with the next args: > i915.modeset=1 i915.i915_enable_rc6=1 i915.i915_enable_fbc=1 > i915.lvds_downclock=1 > > They shouldn't influence the speed, aren't they? > > // Some bench output: > > magnumripper_JohnTheRipper > run/john -format=Raw-MD5-opencl -te > Device 0: Intel(R) HD Graphics IvyBridge M GT2 > Local worksize (LWS) 16, global worksize (GWS) 1048576 > Benchmarking: Raw-MD5-opencl [MD5 OpenCL (inefficient, development use > only)]... DONE > Raw: 30107K c/s real, 84468K c/s virtual > > magnumripper_JohnTheRipper > run/john -format=Raw-MD5 -te > Will run 4 OpenMP threads > Benchmarking: Raw-MD5 [MD5 128/128 AVX 12x]... (4xOMP) DONE > Raw: 40206K c/s real, 10497K c/s virtual > > magnumripper_JohnTheRipper > run/john -format=ecnfs -te > Unknown ciphertext format name requested > magnumripper_JohnTheRipper > run/john -format=encfs -te > Will run 4 OpenMP threads > Benchmarking: EncFS [PBKDF2-SHA1 AES/Blowfish 8x SSE2]... (4xOMP) DONE > Raw: 62.1 c/s real, 16.4 c/s virtual > > magnumripper_JohnTheRipper > run/john -format=encfs-opencl -te > Will run 4 OpenMP threads > Device 0: Intel(R) HD Graphics IvyBridge M GT2 > Local worksize (LWS) 64, global worksize (GWS) 64 > Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... > (4xOMP) DONE > Raw: 7.8 c/s real, 4266 c/s virtual > > magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1-opencl -te > Device 0: Intel(R) HD Graphics IvyBridge M GT2 > Local worksize (LWS) 64, global worksize (GWS) 8192 > Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 4x]... DONE > Raw: 12459 c/s real, 3276K c/s virtual > > magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1 -te > Will run 4 OpenMP threads > Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (4xOMP) DONE > Raw: 16062 c/s real, 5957 c/s virtual > > Thanks. > > // wbr > // alxchk _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
