Could you try latest git master beignet? We have some major
performance improvement for some cases.
Not sure whether it is the case for JohnTheRipper, but worth to give a try.

Thanks,
Zhigang Gong.

On Sat, Oct 25, 2014 at 2:46 PM, Oleksii Shevchuk
<[email protected]> wrote:
> Zhigang Gong <[email protected]> writes:
>
>> This should be an application bug, according to OpenCL 1.2 spec:
>>
>>   CL_PROFILING_INFO_NOT_AVAILABLE if the CL_QUEUE_PROFILING_ENABLE
>>   flag is not set for the command-queue, if the execution status of
>> the command identified
>>   by event is not CL_COMPLETE or if event is a user event object.
>>
>> To make sure an event's state to be CL_COMPLETE, you need to call
>> clWaitForEvents()
>> rather than clFinish().
>>
>> According to spec, clFinish() is used to :
>>   blocks until all previously queued OpenCL commands in command_queue are 
>> issued
>>   to the associated device and have completed.
>>
>> It is not to update all the related event's state. And it is too
>> heavy, as it will wait for the command
>> to be completed. The event's CL_COMPLETE state means the command has
>> been flushed into
>> the GPU's command buffer and may haven't completed. It's used to do
>> GPU command queue
>> side synchronization. clFinish() is to synchronize with host CPU.
>>
>> I would recommend you to call clWaitForEvents before you call the
>> clGetEventProfilingInfo().
>> If you still met problems with that change, please let us know.
>>
>
> Thanks, It's works. Slow, but works. Maybe this is the problem with
> their implementation.
>
> Btw, i call clWaitForEvents for 1 event in list every time before
> calling clGetEventProfilingInfo on that event. Is it ok, or should I
> call it for a whole event list?
>
> Also, I use i915 driver with the next args:
> i915.modeset=1 i915.i915_enable_rc6=1 i915.i915_enable_fbc=1
> i915.lvds_downclock=1
>
> They shouldn't influence the speed, aren't they?
>
> // Some bench output:
>
> magnumripper_JohnTheRipper > run/john -format=Raw-MD5-opencl -te
> Device 0: Intel(R) HD Graphics IvyBridge M GT2
> Local worksize (LWS) 16, global worksize (GWS) 1048576
> Benchmarking: Raw-MD5-opencl [MD5 OpenCL (inefficient, development use 
> only)]... DONE
> Raw:    30107K c/s real, 84468K c/s virtual
>
> magnumripper_JohnTheRipper > run/john -format=Raw-MD5 -te
> Will run 4 OpenMP threads
> Benchmarking: Raw-MD5 [MD5 128/128 AVX 12x]... (4xOMP) DONE
> Raw:    40206K c/s real, 10497K c/s virtual
>
> magnumripper_JohnTheRipper > run/john -format=ecnfs -te
> Unknown ciphertext format name requested
> magnumripper_JohnTheRipper > run/john -format=encfs -te
> Will run 4 OpenMP threads
> Benchmarking: EncFS [PBKDF2-SHA1 AES/Blowfish 8x SSE2]... (4xOMP) DONE
> Raw:    62.1 c/s real, 16.4 c/s virtual
>
> magnumripper_JohnTheRipper > run/john -format=encfs-opencl -te
> Will run 4 OpenMP threads
> Device 0: Intel(R) HD Graphics IvyBridge M GT2
> Local worksize (LWS) 64, global worksize (GWS) 64
> Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... 
> (4xOMP) DONE
> Raw:    7.8 c/s real, 4266 c/s virtual
>
> magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1-opencl -te
> Device 0: Intel(R) HD Graphics IvyBridge M GT2
> Local worksize (LWS) 64, global worksize (GWS) 8192
> Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 4x]... DONE
> Raw:    12459 c/s real, 3276K c/s virtual
>
> magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1 -te
> Will run 4 OpenMP threads
> Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (4xOMP) DONE
> Raw:    16062 c/s real, 5957 c/s virtual
>
> Thanks.
>
> // wbr
> // alxchk
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to