[RFC 0/9] nuclear pageflip

Rob Clark Fri, 14 Sep 2012 08:25:53 -0500

On Fri, Sep 14, 2012 at 7:50 AM, Ville Syrj?l?
<ville.syrjala at linux.intel.com> wrote:
> On Thu, Sep 13, 2012 at 11:35:59AM -0500, Rob Clark wrote:
>> On Thu, Sep 13, 2012 at 9:29 AM, Ville Syrj?l?
>> <ville.syrjala at linux.intel.com> wrote:
>> > On Thu, Sep 13, 2012 at 08:39:54AM -0500, Rob Clark wrote:
>> >> On Thu, Sep 13, 2012 at 3:40 AM, Ville Syrj?l?
>> >> <ville.syrjala at linux.intel.com> wrote:
[snip]
>> >> >
>> >> > I would say this is going to be the most common use case if you consider
>> >> > just the number of shipping devices. It's pretty much what every Android
>> >> > phone/tablet with a HDMI port has to do.
>> >>
>> >> bleh, surfaceflinger kinda sucks then..
>> >
>> > Why? This use case is not enforced by surfaceflinger, it's just the use
>> > case most devices would have.
>> >
>> > I don't think there's anything wrong with the way surfaceflinger is 
>> > designed
>> > with the prepare and commit phases. How else would you do it?
>>
>> well, maybe I misunderstood how surfaceflinger works, but it sounded
>> like it has one prepare/commit phase across outputs, vs what weston
>> compositor does where each output is rendered and flipped
>> independently at the rate of that particular output.  If the two
>> outputs just happen to be vsync aligned, you would end up flipping at
>> the same time, but if the are not locked you don't have any artificial
>> constraint in the rendering/flipping.
>
> OK so it's purely a pull based model, whereas surfaceflinger is more
> push based.
>
> I suppose it might be possible to make surfaceflinger support a pull
> model by driving the compositor loop through a combined signal from
> multiple outputs. But IIRC it did have some timing related code in
> there somewhere, so it might not be happy about it. It might also


As I understood, at least in older versions android versions,
rendering was based on a timer as there was no vblank event to
userspace on most SoC platforms (which sounds strange, but so far most
SoC's are using fbdev and/or crazy hacks rather than drm/kms)

not sure if the timer is still there.. but I hope it goes away, it is
really a horrible way to keep track of vsync

> affect the clients' rendering speed since the compositor would be
> pulling their buffers from queue at non-constant speed. I don't
> remember the details of the buffer management very well, so I can't be
> sure though. But I probably wouldn't bother trying this, since the
> straightforward approach is so simple, and the results are reasonably
> good.
>
> The pull model does seem more flexible. But it does require a bit of
> extra complexity in the compositor to avoid compositing the same scene
> multiple times needlessly when multiple cloned displays are involved.
> I suppose ideally you'd want to recompose for each display to minimize
> visible latency, but from power usage POV it may not be a good idea.

fwiw, weston is already being pretty clever about keeping track of
damage and minimizing the area of the screen that must be re-rendered.
 I'm not sure if SF does anything like this.

>> >> >> >From userspace API, I guess something like:
>> >> >>
>> >> >> struct drm_mode_crtc_atomic_page_flip {
>> >> >>       uint32_t flags;
>> >> >>       uint32_t count_crtcs;
>> >> >>       uint64_t crtc_ids_ptr;  /* array of uint32_t */
>> >> >>       uint64_t count_props_ptr; /* array of uint32_t, # of prop's per 
>> >> >> crtc */
>> >> >>       uint64_t props_ptr;  /* ptr to array of 
>> >> >> drm_mode_obj_set_property */
>> >> >>       uint64_t user_data;
>> >> >> };
>> >> >
>> >> > Starting to look much like my drm_mode_atomic struct :)
>> >> >
>> >> > Let's compare:
>> >> >
>> >> > struct drm_mode_atomic {
>> >> >         __u32 flags;
>> >> >         __u32 count_objs;
>> >> >         __u64 objs_ptr;
>> >> >         __u64 count_props_ptr;
>> >> >         __u64 props_ptr;
>> >> >         __u64 prop_values_ptr;
>> >> >         __u64 blob_values_ptr;
>> >> > };
>> >>
>> >> well, you do miss userdata, I think
>> >
>> > Sure, because I didn't add the event stuff yet.
>>
>> note that the test phase doesn't need vblank events, and also
>> shouldn't -EBUSY if there is still a pending flip[*],
>
> Right. Personally I'm not a fan of the EBUSY behaviour at all. Seems
> a bit pointless since user space can take care of it via the event
> mechanism. But I suppose you want it for omap so that you can avoid
> having to write software workarounds to overcome the GO bit
> limitations.

I the main issue is disconnecting an overlay from one crtc and
connecting to another.. I would expect that any hw which can connect
an ovl to more than one possible crtc would have the same limit (ie.
have to wait until scanout on previous crtc completes), so I think
EBUSY is a good way to indicate to userspace that the requested
configuration is not possible *now* but would be possible in the
future.

>> so I'd propose
>> that however we go about pageflip (one super-ioctl, or one per crtc),
>> we could use the atomic-modeset ioctl for the test step
>
> Yeah that seems reasonable. If we do that, then it doesn't matter that
> much whether we have commit per pipe or not, except for the synced
> displays case, which I'd still like to support if possible. At least
> someone explicitly wanted such a feature in Medfield. I assume they
> had a fairly compelling use case because they were able to convince
> the hardware designers of the need.

heheh, well, it wouldn't be the first time I've seen support for crazy
features in hw :-P

but anyways, I guess I'd be interested in hearing more opinions on
whether it is worth worrying about

>> [*] this is one thing not correct still in my current patches.. the
>> driver doesn't know the difference between test and real-flip at the
>> point where it checks if there is a pending flip.
>>
>> >> > One differences seem to be that you have a mix of SOA and AOS concepts
>> >> > in yours, whereas mine is pure SOA.
>> >>
>> >> well, I was trying to re-use drm_mode_obj_set_property because that
>> >> seemed to keep the code simple, and makes it more similar to existing
>> >> set-property stuff.
>> >
>> > I sort of modelled mine after the way things are done by the various
>> > getter ioctls. I don't like messing up my brain by mixing SOA
>> > and AOS in the same place.
>>
>> fair enough.. anyways, that was just a quick proposal in an email
>> thread, we can tweak how the data is layed out
>
> Sure. The layout isn't that important to me either. All the ioctl
> structures are already ugly so we can't make the situation much
> worse :)
>
>> >> OTOH, maybe that doesn't really matter because
>> >> userspace would normally be going through libdrm and not seeing the
>> >> ioctl structs directly, so we could make the ioctl structs as ugly as
>> >> we want.
>> >>
>> >> I'm a bit on the fence about how the pageflip ioctl should look, in
>> >> particular about pageflip on multiple CRTCs.  I'd like to know the
>> >> involved CRTC(s) upfront, as that seems to make error checking easier
>> >> on the driver (in case of already pending flip).  Although I'll think
>> >> a bit about alternatives.
>> >
>> > I don't see much point in iterating that information directly in begin().
>> > You anyway get the same information during the set() operations. I
>> > suppose you could save some effort by avoiding some state allocation in
>> > begin(). But really, this seems like optimizing for the an uncommon error
>> > case. You should not encounter it in practice unless your user space is
>> > doing something silly.
>>
>> well, it was more about trying to make the error checking easy to get
>> right in the drivers.  But maybe it is easy enough to keep track of
>> the involved crtc(s) and do the check in the atomic_test() step.
>>
>> >> Also, if you pageflip on multiple CRTC's, should the be multiple
>> >> vblank events, and multiple userdata's?
>> >
>> > That's a bit of an open question. I was considering several options:
>>
>> the thing I like about one ioctl per crtc is that it avoids this whole
>> question..
>>
>> And, I think as long as you have to update multiple different scanout
>> address registers, there is always going to be a race in multi-crtc
>> flipping.  Having a single ioctl does make the race smaller.  I'm not
>> sure how important that point is.
>
> Which race?

ie. if you set REG_CRTC1_ADDR just immediately before vblank and
REG_CRTC2_ADDR just after

>> > 1) One event after the full operation is complete
>> >  Not a good idea when you want to throttle based on a fast refreshing
>> >  display but the system has also slow refreshing displays.
>> >
>> > 2) One event per pipe
>> >  Seems reasonable. We could use a bitmask so that the user can ask for
>> >  the event to be delivered only for specific pipes. I'm not sure whether
>> >  multiple user_datas would be better, or just the one with user space
>> >  taking care of refcounting it in case it's a pointer to some object.
>> >
>> > 3) One event per scanout engine
>> >  Not sure this makes sense since the idea is anyway to do things
>> >  atomically within the pipe.
>>
>> well, if you aren't vsync locked, it isn't going to be atomic
>> anyways.. and not vsync locked seems like the common case to me.
>
> One reason why I considered the one event per scanout engine is that
> some overlay implementations can have their own "vblank/flip done"
> interrupt independent from the whole pipe. Such an interrupt would fire
> as soon as the scan has passed over the overlay's visible region, even if
> that happens to be well inside the crtc's active video region. It could
> allow the rendering for the next frame to begin slightly earlier when
> there's a shortage of buffers.

hmm, ok, so even more than one event per CRTC, but one per
CRTC+plane.. well, we can use this on omap, although I'm not quite
sure the tradeoff (slighly less latency to release the buffer, vs more
IRQ's and waking up the CPU more..).

I guess it isn't too critical now, we can always add a new flag later.

>> >> Really the atomic-pageflip ioctl is pretty much a small part of the
>> >> entire patchset, so I'm not really too much against changing it.  Most
>> >> of the point of the patchset was to try to enhance properties to do
>> >> what we need, and come up w/ a sane way to decouple state from the
>> >> various kms objects to easily handle atomic commit or rollback.
>> >
>> > I agree 100% with this goal. Once the state would be split cleanly, we
>> > could do stuff like keep the user and fbcon states totally separate.
>> > Restoring the fbcon state would then trivial. Currently the fbcon
>> > state as such doesn't really exist. IIRC you had some patches to reset
>> > various bits of state at certain points just to work around this issue.
>>
>> one minor point:  I was thinking the same thing but Daniel pointed out
>> that we still have to deal w/ the potential case that the user
>> unplugged one display and plugged a different display between leaving
>> fbcon and restoring fbcon.  So we always have to deal with the
>> potential for full modeset.  But maybe that is ok, we perhaps could
>> just let the atomic_test() step fail in this case and then trigger
>> full modeset?  I'm not quite sure, I haven't thought through this
>> fully, but it does seem like there is potential to simplify things.
>> At any rate, it isn't so important for the initial patchset, but it is
>> worth thinking about when we get to the atomic-modeset part and
>> figuring out how helpers, etc, fit in to this.
>
> Yeah, the displays changing does pose a problem. I think ideally you'd
> take the saved state, but instead of commiting it as is, you'd adjust
> it to accomodate the new display layout.
>
>> >> > Also your struct has a bunch of redundant data since
>> >> > drm_mode_obj_set_property keeps repeating the object ID needlessly, and
>> >> > crtc_ids_ptr is also just repeating information already present in
>> >> > drm_mode_obj_set_property. drm_mode_obj_set_property further forces you
>> >> > to pass in the object type, which is pretty much useless.
>> >>
>> >> well, not completely, the object-id/type could also be a plane
>> >> attached to a crtc..
>> >
>> > Sure, but even that gets repeated needlessly for every prop.
>>
>> well, for libdrm API, what made sense to me is just an array of
>> property type+id+value..  libdrm *could* support this by sorting the
>> props and flattening things out.  It would make the amount of data
>> that is copy_from_user()'d a bit more compact.  But I am not sure that
>> this is really worth worrying about.  Especially if you are just
>> passing the values that have changed since last flip.  I'd lean
>> towards keeping it simple rather than optimizing out a few bytes of
>> copy_from_user().
>
> Did you check the libdrm API I have? It frees the client from having to
> collect any property arrays because that could turn into a realloc()
> fest quite quickly. And the implementation itself is hidden inside libdrm
> so if the linked list based implementation turns out insufficient it's
> possible to change the data structrures.

actually I didn't even realize you had a libdrm branch until now..
I'll have a look

>> >> > Your version can't pass in blobs, which is going to be a problem
>> >> > when you want to push in gamma tables/ramps and whatnot. Also I've been
>> >> > considering that for atomic modesetting I'd pass display modes as blobs
>> >> > too, since the display mode object IDs are currently hidden from user
>> >> > space, and also the IDs are a bit volatile since the modes can be
>> >> > shuffled around during EDID probing. I suppose one could try to fix
>> >> > these display mode ID issues, and when you need to push in gamma tables
>> >> > or other heavy weight data, you invent new types of drm objects to
>> >> > describe them, and you pass them by ID as well.
>> >>
>> >> hmm, ok, I was wondering why you were supporting blobs, when existing
>> >> setprop APIs where not.  Well, maybe it wouldn't be a bad idea to
>> >> start by making existing setprop stuff support blobs.
>> >
>> > I wouldn't care about the existing APIs. Why would anyone use them when
>> > the atomic API provides the same capability and more?
>>
>> well, I guess there is no strong need to.. but it just seemed strange
>> to require using the new API to set a single property.  Anyways, it
>> could be that it is not worth worrying about.
>
> It keeps the client code much simpler.
>
>> Fwiw the way drm_ioctl() works, it should be possible to add new
>> fields to an ioctl struct at the end, as long as zero has a sensible /
>> backwards-compatible meaning.  Ie. old userspace w/ smaller struct on
>> new kernel, the driver will just see the new fields memset to zero.
>> New userspace on old kernel, drm_property_change_is_valid() would
>> fail.
>
> Really? Isn't the struct size encoded into the ioctl ID itself?

yes, but the dispatch code in drm_ioctl() only considers the ioctl #
bitfield in figuring out which handler to dispatch the ioctl to.  It
uses the size field in the copy_{to,from}_user().  Thanks to the size
encoded in the ioctl # we can deal w/ new fields added at the end.

(well, slight complication that older kernels didn't zero out the
excess ioctl size if userspace passes a smaller struct.  But that
should only be a problem if we made ioctl structs *smaller*, so it
should be ok)

BR,
-R

> --
> Ville Syrj?l?
> Intel OTC
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC 0/9] nuclear pageflip

Reply via email to