Re: [dpdk-dev] [RFC v2] doc compression API for DPDK

Verma, Shally Tue, 13 Feb 2018 23:41:53 -0800

Hi Ahmed,

>-----Original Message-----
>From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com]
>Sent: 02 February 2018 01:53
>To: Trahe, Fiona <fiona.tr...@intel.com>; Verma, Shally 
><shally.ve...@cavium.com>; dev@dpdk.org
>Cc: Athreya, Narayana Prasad <narayanaprasad.athr...@cavium.com>; Gupta, 
>Ashish <ashish.gu...@cavium.com>; Sahu, Sunila
><sunila.s...@cavium.com>; De Lara Guarch, Pablo 
><pablo.de.lara.gua...@intel.com>; Challa, Mahipal
><mahipal.cha...@cavium.com>; Jain, Deepak K <deepak.k.j...@intel.com>; Hemant 
>Agrawal <hemant.agra...@nxp.com>; Roy
>Pledge <roy.ple...@nxp.com>; Youri Querry <youri.querr...@nxp.com>
>Subject: Re: [RFC v2] doc compression API for DPDK
>
>On 1/31/2018 2:03 PM, Trahe, Fiona wrote:
>> Hi Ahmed, Shally,
>>
>> ///snip///
>>>>>>>> D.1.1 Stateless and OUT_OF_SPACE
>>>>>>>> ------------------------------------------------
>>>>>>>> OUT_OF_SPACE is a condition when output buffer runs out of space
>>>>> and
>>>>>>> where PMD still has more data to produce. If PMD run into such
>>>>> condition,
>>>>>>> then it's an error condition in stateless processing.
>>>>>>>> In such case, PMD resets itself and return with status
>>>>>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0
>>>>> i.e.
>>>>>>> no input read, no output written.
>>>>>>>> Application can resubmit an full input with larger output buffer size.
>>>>>>> [Ahmed] Can we add an option to allow the user to read the data that
>>>>> was
>>>>>>> produced while still reporting OUT_OF_SPACE? this is mainly useful for
>>>>>>> decompression applications doing search.
>>>>>> [Shally] It is there but applicable for stateful operation type (please 
>>>>>> refer to
>>>>> handling out_of_space under
>>>>>> "Stateful Section").
>>>>>> By definition, "stateless" here means that application (such as IPCOMP)
>>>>> knows maximum output size
>>>>>> guaranteedly and ensure that uncompressed data size cannot grow more
>>>>> than provided output buffer.
>>>>>> Such apps can submit an op with type = STATELESS and provide full input,
>>>>> then PMD assume it has
>>>>>> sufficient input and output and thus doesn't need to maintain any 
>>>>>> contexts
>>>>> after op is processed.
>>>>>> If application doesn't know about max output size, then it should 
>>>>>> process it
>>>>> as stateful op i.e. setup op
>>>>>> with type = STATEFUL and attach a stream so that PMD can maintain
>>>>> relevant context to handle such
>>>>>> condition.
>>>>> [Fiona] There may be an alternative that's useful for Ahmed, while still
>>>>> respecting the stateless concept.
>>>>> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
>>>>> case
>>>>> it could also return consumed=0, produced = x, where x>0. X indicates the
>>>>> amount of valid data which has
>>>>>  been written to the output buffer. It is not complete, but if an 
>>>>> application
>>>>> wants to search it it may be sufficient.
>>>>> If the application still wants the data it must resubmit the whole input 
>>>>> with a
>>>>> bigger output buffer, and
>>>>>  decompression will be repeated from the start, it
>>>>>  cannot expect to continue on as the PMD has not maintained state, history
>>>>> or data.
>>>>> I don't think there would be any need to indicate this in capabilities, 
>>>>> PMDs
>>>>> which cannot provide this
>>>>> functionality would always return produced=consumed=0, while PMDs which
>>>>> can could set produced > 0.
>>>>> If this works for you both, we could consider a similar case for 
>>>>> compression.
>>>>>
>>>> [Shally] Sounds Fine to me. Though then in that case, consume should also 
>>>> be updated to actual
>>> consumed by PMD.
>>>> Setting consumed = 0 with produced > 0 doesn't correlate.
>>> [Ahmed]I like Fiona's suggestion, but I also do not like the implication
>>> of returning consumed = 0. At the same time returning consumed = y
>>> implies to the user that it can proceed from the middle. I prefer the
>>> consumed = 0 implementation, but I think a different return is needed to
>>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
>>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
>>> future PMD implementations to provide recover-ability even in STATELESS
>>> mode if they so wish. In this model STATELESS or STATEFUL would be a
>>> hint for the PMD implementation to make optimizations for each case, but
>>> it does not force the PMD implementation to limit functionality if it
>>> can provide recover-ability.
>> [Fiona] So you're suggesting the following:
>> OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced
>>     can be used and next op in stream should continue on from op.consumed+1.
>> OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>>     Error condition, no recovery possible.
>>     consumed=produced=0. Application must resubmit all input data with
>>     a bigger output buffer.
>> OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some 
>> recovery possible.
>>      - consumed = 0, produced > 0. Application must resubmit all input data 
>> with
>>         a bigger output buffer. However in decompression case, data up to 
>> produced
>>         in dst buffer may be inspected/searched. Never happens in compression
>>         case as output data would be meaningless.
>>      - consumed > 0, produced > 0. PMD has stored relevant state and history 
>> and so
>>         can convert to stateful, using op.produced and continuing from 
>> consumed+1.
>> I don't expect our PMDs to use this last case, but maybe this works for 
>> others?
>> I'm not convinced it's not just adding complexity. It sounds like a version 
>> of stateful
>> without a stream, and maybe less efficient?
>> If so should it respect the FLUSH flag? Which would have been FULL or FINAL 
>> in the op.
>> Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not
>> simply have submitted a STATEFUL request if this is the behaviour it wants?
>[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely
>and replacing it with
>OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>        Error condition, no recovery possible.
>        - consumed=0 produced=amount of data produced. Application must
>resubmit all input data with
>          a bigger output buffer to process all of the op
>OUT_OF_SPACE_RECOVERABLE -  Normally returned on stateful operation. Not
>an error. Op.produced
>    can be used and next op in stream should continue on from op.consumed+1.
>        -  consumed > 0, produced > 0. PMD has stored relevant state and
>history and so
>            can continue using op.produced and continuing from consumed+1.
>
>We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our
>implementation either.
>
>Regardless of speculative future PMDs. The more important aspect of this
>for today is that the return status clearly determines
>the meaning of "consumed". If it is RECOVERABLE then consumed is
>meaningful. if it is TERMINATED then consumed in meaningless.
>This way we take away the ambiguity of having OUT_OF_SPACE mean two
>different user work flows.
>
>A speculative future PMD may be designed to return RECOVERABLE for
>stateless ops that are attached to streams.
>A future PMD may look to see if an op has a stream is attached and write
>out the state there and go into recoverable mode.
>in essence this leaves the choice up to the implementation and allows
>the PMD to take advantage of stateless optimizations
>so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump
>context as soon as it fully processes an op. It will only
>write context out in cases where the op chokes.
>This futuristic PMD should ignore the FLUSH since this STATELESS mode as
>indicated by the user and optimize


[Shally] IMO, it looks okay to have two separate return code TERMINATED and 
RECOVERABLE with definition as you mentioned and seem doable.
So then it mean all following conditions:
a. stateless with flush = full/final, no stream pointer provided , PMD can 
return TERMINATED i.e. user has to start all over again, it's a failure (as in 
current definition)
b. stateless with flush = full/final, stream pointer provided, here it's up to 
PMD to return either TERMINATED or RECOVERABLE depending upon its ability (note 
if Recoverable, then PMD will maintain states in stream pointer)
c. stateful with flush = full / NO_SYNC, stream pointer always there, PMD will 
TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION feature 
flag enabled or not

and one more exception case is:
d. stateless with flush = full, no stream pointer provided, PMD can return 
RECOVERABLE i.e. PMD internally maintained that state somehow and consumed & 
produced > 0, so user can start consumed+1 but there's restriction on user not 
to alter or change op until it is fully processed?!
 
API currently takes care of case a and c, and case b can be supported if 
specification accept another proposal which mention optional usage of stream 
with stateless. Until then API takes no difference to case b and c i.e. we can 
have op such as,
- type= stateful with flush = full/final, stream pointer provided, PMD can 
return TERMINATED/RECOVERABLE according to its ability

Case d , is something exceptional, if there's requirement in PMDs to support 
it, then believe it will be doable with concept of different return code.

>>>>>>>> D.2 Compression API Stateful operation
>>>>>>>> ----------------------------------------------------------
>>>>>>>>  A Stateful operation in DPDK compression means application invokes
>>>>>>> enqueue burst() multiple times to process related chunk of data either
>>>>>>> because
>>>>>>>> - Application broke data into several ops, and/or
>>>>>>>> - PMD ran into out_of_space situation during input processing
>>>>>>>>
>>>>>>>> In case of either one or all of the above conditions, PMD is required 
>>>>>>>> to
>>>>>>> maintain state of op across enque_burst() calls and
>>>>>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
>>>>>>> flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
>>>>>>> RTE_COMP_FULL/FINAL_FLUSH.
>>>>>>>> D.2.1 Stateful operation state maintenance
>>>>>>>> ---------------------------------------------------------------
>>>>>>>> It is always an ideal expectation from application that it should parse
>>>>>>> through all related chunk of source data making its mbuf-chain and
>>>>> enqueue
>>>>>>> it for stateless processing.
>>>>>>>> However, if it need to break it into several enqueue_burst() calls, 
>>>>>>>> then
>>>>> an
>>>>>>> expected call flow would be something like:
>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>> [Ahmed] The work is now in flight to the PMD.The user will call dequeue
>>>>>>> burst in a loop until all ops are received. Is this correct?
>>>>>>>
>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>> [Shally] Yes. Ideally every submitted op need to be dequeued. However
>>>>> this illustration is specifically in
>>>>>> context of stateful op processing to reflect if a stream is broken into
>>>>> chunks, then each chunk should be
>>>>>> submitted as one op at-a-time with type = STATEFUL and need to be
>>>>> dequeued first before next chunk is
>>>>>> enqueued.
>>>>>>
>>>>>>>> enqueue_burst( |op.no_flush |)
>>>>>>>> deque_burst(op) // should dequeue before we enqueue next
>>>>>>>> enqueue_burst( |op.full_flush |)
>>>>>>> [Ahmed] Why now allow multiple work items in flight? I understand that
>>>>>>> occasionaly there will be OUT_OF_SPACE exception. Can we just
>>>>> distinguish
>>>>>>> the response in exception cases?
>>>>>> [Shally] Multiples ops are allowed in flight, however condition is each 
>>>>>> op in
>>>>> such case is independent of
>>>>>> each other i.e. belong to different streams altogether.
>>>>>> Earlier (as part of RFC v1 doc) we did consider the proposal to process 
>>>>>> all
>>>>> related chunks of data in single
>>>>>> burst by passing them as ops array but later found that as not-so-useful 
>>>>>> for
>>>>> PMD handling for various
>>>>>> reasons. You may please refer to RFC v1 doc review comments for same.
>>>>> [Fiona] Agree with Shally. In summary, as only one op can be processed at 
>>>>> a
>>>>> time, since each needs the
>>>>> state of the previous, to allow more than 1 op to be in-flight at a time 
>>>>> would
>>>>> force PMDs to implement internal queueing and exception handling for
>>>>> OUT_OF_SPACE conditions you mention.
>>> [Ahmed] But we are putting the ops on qps which would make them
>>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more
>>> complex but doable.
>> [Fiona] In my opinion this is not doable, could be very inefficient.
>> There may be many streams.
>> The PMD would have to have an internal queue per stream so
>> it could adjust the next src offset and length in the OUT_OF_SPACE case.
>> And this may ripple back though all subsequent ops in the stream as each
>> source len is increased and its dst buffer is not big enough.
>[Ahmed] Regarding multi op OUT_OF_SPACE handling.
>The caller would still need to adjust
>the src length/output buffer as you say. The PMD cannot handle
>OUT_OF_SPACE internally.
>After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream
>until it gets explicit
>confirmation from the caller to continue working on this stream. Any ops
>received by
>the PMD should be returned to the caller with status STREAM_PAUSED since
>the caller did not
>explicitly acknowledge that it has solved the OUT_OF_SPACE issue.
>These semantics can be enabled by adding a new function to the API
>perhaps stream_resume().
>This allows the caller to indicate that it acknowledges that it has seen
>the issue and this op
>should be used to resolve the issue. Implementations that do not support
>this mode of use
>can push back immediately after one op is in flight. Implementations
>that support this use
>mode can allow many ops from the same session
>
[Shally] Is it still in context of having single burst where all op belongs to 
one stream? If yes, I would still say it would add an overhead to PMDs 
especially if it is expected to work closer to HW (which I think is the case 
with DPDK PMD).
Though your approach is doable but why this all cannot be in a layer above PMD? 
i.e. a layer above PMD can either pass one-op at a time with burst size = 1 OR 
can make chained mbuf of input and output and pass than as one op.
Is it just to ease applications of chained mbuf burden or do you see any 
performance /use-case impacting aspect also?

if it is in context where each op belong to different stream in a burst, then 
why do we need stream_pause and resume? It is a expectations from app to pass 
more output buffer with consumed + 1 from next call onwards as it has already
seen OUT_OF_SPACE.

>Regarding the ordering of ops
>We do force serialization of ops belonging to a stream in STATEFUL
>operation. Related ops do
>not go out of order and are given to available PMDs one at a time.
>
>>> The question is this mode of use useful for real
>>> life applications or would we be just adding complexity? The technical
>>> advantage of this is that processing of Stateful ops is interdependent
>>> and PMDs can take advantage of caching and other optimizations to make
>>> processing related ops much faster than switching on every op. PMDs have
>>> maintain state of more than 32 KB for DEFLATE for every stream.
>>>>> If the application has all the data, it can put it into chained mbufs in 
>>>>> a single
>>>>> op rather than
>>>>> multiple ops, which avoids pushing all that complexity down to the PMDs.
>>> [Ahmed] I think that your suggested scheme of putting all related mbufs
>>> into one op may be the best solution without the extra complexity of
>>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra
>>> time If we have a way of marking mbufs as ready for consumption. The
>>> enqueuer may not have all the data at hand but can enqueue the op with a
>>> couple of empty mbus marked as not ready for consumption. The enqueuer
>>> will then update the rest of the mbufs to ready for consumption once the
>>> data is added. This introduces a race condition. A second flag for each
>>> mbuf can be updated by the PMD to indicate that it processed it or not.
>>> This way in cases where the PMD beat the application to the op, the
>>> application will just update the op to point to the first unprocessed
>>> mbuf and resend it to the PMD.
>> [Fiona] This doesn't sound safe. You want to add data to a stream after 
>> you've
>> enqueued the op. You would have to write to op.src.length at a time when the 
>> PMD
>> might be reading it. Sounds like a lock would be necessary.
>> Once the op has been enqueued, my understanding is its ownership is handed
>> over to the PMD and the application should not touch it until it has been 
>> dequeued.
>> I don't think it's a good idea to change this model.
>> Can't the application just collect a stream of data in chained mbufs until 
>> it has
>> enough to send an op, then construct the op and while waiting for that op to
>> complete, accumulate the next batch of chained mbufs? Only construct the 
>> next op
>> after the previous one is complete, based on the result of the previous one.
>>
>[Ahmed] Fair enough. I agree with you. I imagined it in a different way
>in which each mbuf would have its own length.
>The advantage to gain is in applications where there is one PMD user,
>the down time between ops can be significant and setting up a single
>producer consumer pair significantly reduces the CPU cycles and PMD down
>time.
>
>////snip////

Re: [dpdk-dev] [RFC v2] doc compression API for DPDK

Reply via email to