Re: [julia-users] Re: Using Julia for real time astronomy

John leger Tue, 07 Jun 2016 02:26:53 -0700

Hi Islam,

I like the definition of 95% hard real time; it suits my needs. Thanks for 
this good paper.


Le lundi 6 juin 2016 18:45:35 UTC+2, Islam Badreldin a écrit :
>
> Hi John,
>
> I am currently pursuing similar effort. I got a GPIO pin on the BeagleBone 
> Black embedded board toggling in hard real-time and verified the jitter 
> with an oscilloscope. For that, I used a vanilla Linux 4.4.11 kernel with 
> the PREEMPT_RT patch applied. I also released an initial version of a Julia 
> package that wraps the clock_nanosleep() and clock_gettime() functions from 
> the POSIX real-time extensions. Please see this other thread:
> https://groups.google.com/forum/#!topic/julia-users/0Vr2rCRwJY4
>
> I tested that package both on Intel-based laptop and on the BeagleBone 
> Black. I am giving some of the relevant details below..
>
> On Monday, June 6, 2016 at 5:41:29 AM UTC-4, John leger wrote:
>>
>> Since it seems you have a good overview in this domain I will give more 
>> details:
>> We are working in signal processing and especially in image processing. 
>> The goal here is just the adaptive optic: we just want to stabilize the 
>> image and not get the final image.
>> The consequence is that we will not store anything on the hard drive: we 
>> read an image, process it and destroy it. We stay in RAM all the time.
>> The processing is done by using/coding our algorithms. So for now, no 
>> need of any external library (for now, but I don't see any reason for that 
>> now)
>>
>> First I would like to apologize: just after posting my answer I went to 
>> wikipedia to search the difference between soft and real time. 
>> I should have done it before so that you don't have to spend more time to 
>> explain.
>>
>> In the end I still don't know if I am hard real time or soft real time: 
>> the timing is given by the camera speed and the processing should be done 
>> between the acquisition of two images.
>> We don't want to miss an image or delay the processing, I still need to 
>> clarify the consequences of a delay or if we miss an image.
>> For now let's just say that we can miss some images so we want soft real 
>> time.
>>
>
> The real-time performance you are after could be 95% hard real-time. See 
> e.g. here: https://www.osadl.org/fileadmin/dam/rtlws/12/Brown.pdf
>  
>
>>
>> I'm making a benchmark that should match the system in term of 
>> complexity, these are my first remarks:
>>
>> When you say that one allocation is unacceptable, I say it's shockingly 
>> true: In my case I had 2 allocations done by:
>>     A +=1 where A is an array
>> and in 7 seconds I had 600k allocations. 
>> Morality :In closed loop you cannot accept any alloc and so you have to 
>> explicit all loops.
>>
>
> Yes, try to completely avoid memory allocations while developing your own 
> algorithms in Julia. Pre-allocations and in-place operations are your 
> friends! The example script available on the POSIXClock package is one way 
> to do this (
> https://github.com/ibadr/POSIXClock.jl/blob/master/examples/rt_histogram.jl). 
> The real-time section of the code is marked by a ccall to mlockall() in 
> order to cause immediate failure upon memory allocations in the real-time 
> section. You can also use the --track-allocation option to hunt down 
> memory allocations while developing your algorithm. See e.g. 
> http://docs.julialang.org/en/release-0.4/manual/profile/#man-track-allocation
>  
>

I discovered --track-allocation not so long ago and it is a good tool. For 
now I think I will rely on tracking allocation manually. I am a little 
afraid of using mlockall(): In soft or real time crashing (failure) is not 
a good option for me...
Since you are talking about --track-allocation I have a question:


        -     function deflat(v::globalVar)
        0         @simd for i in 1:v.len_sub
        0             @inbounds v.sub_imagef[i] = v.flat[i]*v.image[i]
        -         end
        -         
        0         @simd for i in 1:v.len_ref
        0             @inbounds v.ref_imagef[i] = v.flat[i]*v.image[i]
        -         end
        0         return
        -     end
        - 
        -     # get min max
        -     # apply norm_coef
        -     # MORE TO DO HERE
        -     function normalization(v::globalVar)
        0         min::Float32 = Float32(4095)
        0         max::Float32 = Float32(0)
        0         tmp::Float32 = Float32(0)
        0         norm_fact::Float32 = Float32(0)
        0         norm_coef::Float32 = Float32(0)
        -         # find min max
        0         @simd for i in 1:v.nb_mat
        0             # Doing something with no allocs
        0         end
        0     end
        0 
  1226415     # SAD[70] 16x16 de Ref_Image sur Sub_Image[60]
        -     function correlation_SAD(v::globalVar)
        0 
        -     end
        - 

In the mem output file I have this information: at the end of normalization 
I have no alloc and in front of the SAD comment and before the empty 
correlation function I have 1226415 allocations.
It should be logic that these allocations happened in normalization but why 
is it here between two function ?
 

>
>> I have two problems now:
>>
>> 1/ Many times, the first run that include the compilation was the fastest 
>> and then any other run was slower by a factor 2.
>> 2/ If I relaunch many times the main function that is in a module, there 
>> are some run that were very different (slower) from the previous.
>>
>> About 1/, although I find it strange I don't really care.
>> 2/ If far more problematic, once the code is compiled I want it to act 
>> the same whatever the number of launch.
>> I have some ideas why but no certitudes. What bother me the most is that 
>> all the runs in the benchmark will be slower, it's not a temporary slowdown 
>> it's all the current benchmark that will be slower.
>> If I launch again it will be back to the best performances.
>>
>> Thank you for the links they are very interesting and I keep that in mind.
>>
>> Note: I disabled hyperthreading and overclock, so it should not be the 
>> CPU doing funky things.
>>
>>
>>
> Regarding these two issues, I encountered similar ones. Are you running on 
> an Intel-based computer? I had to do many tweaks to get to acceptable 
> real-time performance with Intel processors. Many factors could be at play. 
> As you said, you have to make sure hyper-threading is disabled and not to 
> overclock the processor. Also, monitor the kernel dmesg log for any errors 
> or warnings regarding RT throttling or local_softitq_pending.
>
> Additionally, I had to use the following options in the Linux command line 
> (pass them from the bootloader):
>
> intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll
>
> Together with removing the intel_powerclamp kernel module (sudo rm 
> intel_powerclamp). Caution: be extremely careful with such configuration as 
> it disables many power saving features in the processor and can potentially 
> overheat it. Keep an eye on the kernel dmesg log and try to monitor the CPU 
> temperature.
>
> I also found it useful to isolate one CPU core using the isolcpus=1 kernel 
> command line option and then set the affinity of the real-time Julia 
> process to run on that isolated CPU (using the taskset command). This way, 
> you can almost guarantee the Linux kernel and all other user-space process 
> will not run on that isolated CPU so it becomes wholly dedicated to running 
> the real-time Julia process. I am planning to post more details to the 
> POSIXClock package in the near future.
>
>
I have an intel processor indeed and thanks for all the tips I will first 
try to apply to isolate a CPU then disabling the intel options.
 

> Best,
> Islam
>
>
Again thanks a lot for all the help.

Re: [julia-users] Re: Using Julia for real time astronomy

Reply via email to