Hi Laurent,

I've been reworking this series to split things out and adapt for the
comments you've provided, but I have the following queries outstanding:

On 15/12/16 11:50, Kieran Bingham wrote:
> Hi Laurent,
> 
> On 14/12/16 16:30, Laurent Pinchart wrote:
>> Hi Kieran,
>>
>> Thank you for the patch.
>>
>> On Tuesday 13 Dec 2016 17:59:42 Kieran Bingham wrote:
>>> With multiple inputs through the BRU it is feasible for the streams to
>>> race each other at stream-on.
>>
>> Could you please explain the race condition in the commit message ? The 
>> issue 
>> is that multiple VIDIOC_STREAMON calls racing each other could have process 
>> N-1 skipping over the pipeline setup section and then start the pipeline, if 
>> videobuf2 has already enqueued buffers to the driver for process N but not 
>> called the .start_streaming() operation yet.
>>
>>> In the case of the video pipelines, this
>>> can present two serious issues.
>>>
>>>  1) A null-dereference if the pipe->dl is committed at the same time as
>>>     the vsp1_video_setup_pipeline() is processing
>>>
>>>  2) A hardware hang, where a display list is committed without having
>>>     called vsp1_video_setup_pipeline() first
>>>
>>> Along side these race conditions, the work done by
>>> vsp1_video_setup_pipeline() is undone by the re-initialisation during a
>>> suspend resume cycle, and an active pipeline does not attempt to
>>> reconfigure the correct routing and init parameters for the entities.
>>>
>>> To repair all of these issues, we can move the call to a conditional
>>> inside vsp1_video_pipeline_run() and ensure that this can only be called
>>> on the last stream which calls into vsp1_video_start_streaming()
>>>
>>> As a convenient side effect of this, by specifying that the
>>> configuration has been lost during suspend/resume actions - the
>>> vsp1_video_pipeline_run() call can re-initialise pipelines when
>>> necessary thus repairing resume actions for active m2m pipelines.
>>>
>>> Signed-off-by: Kieran Bingham <kieran.bingham+rene...@ideasonboard.com>
>>>
>>> ---
>>> v3:
>>>  - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
>>>  - Tidy up the wpf->pipe reference for the configured flag
>>>
>>>  drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
>>>  drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
>>>  4 files changed, 16 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c
>>> b/drivers/media/platform/vsp1/vsp1_drv.c index 57c713a4e1df..1dc3726c4e83
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_drv.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
>>> @@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device
>>> *vsp1)
>>>
>>>  int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
>>>  {
>>> +   struct vsp1_rwpf *wpf = vsp1->wpf[index];
>>>     unsigned int timeout;
>>>     u32 status;
>>>
>>> @@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned
>>> int index) usleep_range(1000, 2000);
>>>     }
>>>
>>> +   if (wpf->pipe)
>>> +           wpf->pipe->configured = false;
>>> +
>>>     if (!timeout) {
>>>             dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
>>>             return -ETIMEDOUT;
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 756ca4ea7668..7ddf862ee403
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>>> @@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
>>>
>>>     INIT_LIST_HEAD(&pipe->entities);
>>>     pipe->state = VSP1_PIPELINE_STOPPED;
>>> +   pipe->configured = false;
>>>  }
>>>
>>>  /* Must be called with the pipe irqlock held. */
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index ac4ad2655551..0743b9fcb655
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>>> @@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
>>>   * @pipe: the media pipeline
>>>   * @irqlock: protects the pipeline state
>>>   * @state: current state
>>> + * @configured: determines routing configuration active on cell.
>>
>> I'm not sure to understand that. How about "true if the pipeline has been 
>> set 
>> up" ? Or maybe "true if the pipeline has been set up for video streaming" as 
>> it only applies to pipelines handled through the V4L2 API ?
> 
> 
> Yes, Reading it now - I have no idea what context I was writing that in.
> I hope it was late and I was tired ... otherwise I have no excuse :D
> 
> 
> 
>>>   * @wq: wait queue to wait for state change completion
>>>   * @frame_end: frame end interrupt handler
>>>   * @lock: protects the pipeline use count and stream count
>>> @@ -86,6 +87,7 @@ struct vsp1_pipeline {
>>>
>>>     spinlock_t irqlock;
>>>     enum vsp1_pipeline_state state;
>>> +   bool configured;
>>>     wait_queue_head_t wq;
>>>
>>>     void (*frame_end)(struct vsp1_pipeline *pipe);
>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>>> b/drivers/media/platform/vsp1/vsp1_video.c index 44b687c0b8df..7ff9f4c19ff0
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>>> @@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct
>>> vsp1_pipeline *pipe) VSP1_ENTITY_PARAMS_INIT);
>>>     }
>>>
>>> +   pipe->configured = true;
>>> +
>>>     return 0;
>>>  }
>>>
>>> @@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline
>>> *pipe) struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>>     struct vsp1_entity *entity;
>>>
>>> +   if (!pipe->configured)
>>> +           vsp1_video_setup_pipeline(pipe);
>>> +
>>
>> I don't like this much. The vsp1_video_pipeline_run() is called with a 
>> spinlock held. We should avoid operations as time consuming as 
>> vsp1_video_setup_pipeline() here.


I'm wondering where you stand on this now that the measurements have
been taken.

Is 18uS, once per stream start (/resume) still against your desires here?

Reading the comments from, struct vsp1_pipeline

 * @irqlock: protects the pipeline state
 * @lock: protects the pipeline use count and stream count

Doesn't this infer that calling vsp1_video_setup_pipeline, is better run
with the @irqlock held rather than the @lock anyway?


> I'm going to argue my case here, as I thought this was a more elegant
> solution (and I really seem to dislike that poorly protected global
> pipe->dl);
> 
> However I will back down quickly/happily if needed. Especially knowing
> that we are trying to reduce the recalculations of the DL, and I can see
> where those changes will lead us anyway.
> 
> Regardless of the outcome, I felt it was useful (at least to me) to
> measure the call times on these functions to get an understanding, thus
> I have used ftrace to measure/monitor a full run of the vsp-test suite:
> 
>  trace-cmd record \
>         -p function_graph \
>         -l 'vsp1_video_setup_pipeline' \
>         -l 'vsp1_irq_handler' \
>         \
>         ./vsp-tests.sh
> 
> * I had to mark vsp1_video_setup_pipeline() as noinline to be able to
> measure it with ftrace
> 
> A) Call frequency
> 
> Whilst vsp1_video_pipeline_run() is called for every frame, it should
> only be on the first frame of each stream, (or the first frame following
> a resume operation) where the flag pipe->configured == false.
> 
>  - Measuring in the entirety of the vsp-test suite,  I count 121
>    calls here.
>    (trace-cmd report | grep vsp1_video_setup_pipeline | wc -l)
> 
> 
> B) Function duration
> 
> The log of each of the function durations is available at:
>    http://paste.ubuntu.com/23632986/
> 
> Some quick and dirty statistical analysis shows the following duration
> times for this function.
> 
> trace-cmd report \
>       | grep vsp1_video_setup_pipeline \
>       | sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
>       | st
> 
> ## Numbers in uS ...
> N       min     max     sum     mean    stddev
> 121     4.68    17.76   887.885 7.33789 2.3195
> 
> 
> So we do hit nearly 18 uS in this function, which could be considered a
> lot for interrupt context, but when it is only once at the beginning of
> stream start?
>
> (Incidentally, running the same measurements on the whole of
> vsp1_irq_handler nets the following statistical results:
> 
> trace-cmd report \
>       | grep vsp1_irq_handler \
>       | sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
>       | st
> 
> N       min     max     sum     mean    stddev
> 12360   1.32    165.481 125934  10.1889 15.0117
> 
> Of course the coverage on the IRQ handler is more broad, and I'm not
> sure how to only measure the IRQ handler calls that result in calling
> vsp1_video_setup_pipeline().
> 

Those interrupt context comparisons are irrelevant now that you've
cleared up the reference was due to the spinlock being held, rather than
the calls being in interrupt context :)


> Anyway, like I said - possibly a moot point anyway - but I wanted to
> actually measure the function times to see what we're up against.
> 
>>>     if (!pipe->dl)
>>>             pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>>
>>> @@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue
>>> *vq, unsigned int count) struct vsp1_video *video = vb2_get_drv_priv(vq);
>>>     struct vsp1_pipeline *pipe = video->rwpf->pipe;
>>>     unsigned long flags;
>>> -   int ret;
>>>
>>>     mutex_lock(&pipe->lock);
>>>     if (pipe->stream_count == pipe->num_inputs) {
>>> -           ret = vsp1_video_setup_pipeline(pipe);
>>> -           if (ret < 0) {
>>> -                   mutex_unlock(&pipe->lock);
>>> -                   return ret;
>>> -           }
>>> +           spin_lock_irqsave(&pipe->irqlock, flags);
>>> +           if (vsp1_pipeline_ready(pipe))
>>> +                   vsp1_video_pipeline_run(pipe);
>>> +           spin_unlock_irqrestore(&pipe->irqlock, flags);
>>>     }
>>>
>>>     pipe->stream_count++;
>>>     mutex_unlock(&pipe->lock);
>>>
>>> -   spin_lock_irqsave(&pipe->irqlock, flags);
>>> -   if (vsp1_pipeline_ready(pipe))
>>> -           vsp1_video_pipeline_run(pipe);
>>> -   spin_unlock_irqrestore(&pipe->irqlock, flags);
>>> -
>>
>> How about the following ?
>>
>>      bool start_pipeline = false;
>>
>>      mutex_lock(&pipe->lock);
>>      if (pipe->stream_count == pipe->num_inputs) {
>>              ret = vsp1_video_setup_pipeline(pipe);
>>              if (ret < 0) {
>>                      mutex_unlock(&pipe->lock);
>>                      return ret;
>>              }
>>
>>              start_pipeline = true;
>>      }
>>
>>      pipe->stream_count++;
>>      mutex_unlock(&pipe->lock);
>>
>>      /*
>>       * Don't attempt to start the pipeline if we haven't configured it
>>       * explicitly, as otherwise multiple streamon calls could race each
>>       * other and one of them try to start the pipeline unconfigured.
>>       */
>>      if (!start_pipeline)
>>              return 0;
>>
>>      spin_lock_irqsave(&pipe->irqlock, flags);
>>      if (vsp1_pipeline_ready(pipe))
>>              vsp1_video_pipeline_run(pipe);
>>      spin_unlock_irqrestore(&pipe->irqlock, flags);
> 
> Ok, I'm happy with that, it's effectively v1 of my patchset, before I
> discovered that the suspend resume issue was closely related.
> 
> 
>> This won't fix the suspend/resume issue, but I think splitting that to a 
>> separate patch would be a good idea anyway.
> 

Reverting back to the 'v1' style, and then moving the call to setup the
pipeline, into the pipeline_run() against a configured flag to fix for
suspend/resume cycles, netts the following code :

        mutex_lock(&pipe->lock);
        if (pipe->stream_count == pipe->num_inputs)
                start_pipeline = true;

        pipe->stream_count++;
        mutex_unlock(&pipe->lock);

        /*
         * vsp1_pipeline_ready() is not sufficient to establish that all streams
         * are prepared and the pipeline is configured, as multiple streams
         * can race through streamon with buffers already queued; Therefore we
         * don't even attempt to start the pipeline until the last stream has
         * called through here.
         */
        if (!start_pipeline)
                return 0;

        spin_lock_irqsave(&pipe->irqlock, flags);
        if (vsp1_pipeline_ready(pipe))
                vsp1_video_pipeline_run(pipe);
        spin_unlock_irqrestore(&pipe->irqlock, flags);



> I'll have to look again to consider how this could be done separately.
> 
> 
>> I'm also wondering whether we could have a streamon/suspend race. If the 
>> pipeline is setup and the DL allocated but not committed at suspend time I 
>> think we'll leak the DL at resume time.
> 
> If there is, then I believe I resolved this in :
>  "v4l: vsp1: Use local display lists and remove global pipe->dl"
> 
> but yes, if we end up not being able to use that patch then I'll take a
> look and see if there's anything more needs to be done here.
> 
>>>     return 0;
>>>  }
>>
> 

-- 
Regards

Kieran Bingham

Reply via email to