[GitHub] [tvm] manupa-arm commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

GitBox Fri, 17 Sep 2021 06:07:37 -0700


manupa-arm commented on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-921782373

@tqchen ,

> To say it in another way, we cannot say that "global" definitely mean no
stack allocation.

The current issue is in the device "CPU" && 'global' certainly means its
definitely stack allocation if its less that heuristic size and not the other
way around.

> If the code needs to impose additional constraint that the memory must be
accessible from a separate device(e.g. NPU), it certainly would require a more
specialized constraint that is better spelled out explicitly.

> As we can see that this is another kind of flexibility we want to enable
here -- flexibility of picking possible backend allocation implementations
without over constraining the code generator to a backend specific behavior
that is platform dependent (like the case of pinned memory

Yes this is something we want eventually and we will be working towards
achieving with USMP work.

Until we have that, the natural assumption should be in absense of a
'constraint' that the memories are more accessible rather than being less
accessible (e.g. stack). Its unfortunate that the current design prefers the
latter especially in a absense of a constraint.

@mbs-octoml ,

### Short term solution :
I think you are right, we might want to unblock this using a
target-dependent kMaxStackAllocaSize.

May I ask why was the default chosen to be this ?

https://github.com/apache/tvm/blob/1fd8f610953adc39cbd18d82f4a9e92a11575dfc/include/tvm/runtime/device_api.h#L60-L61

Its interesting because the stack size go up beyond that size as it is just
looking on a single allocate at a time. i.e. you could have multiple allocates
that are less than < 1024. So the stack usage is not even bounded by the
current approach.

Therefore, to both unlock us with Ethos-U and also somewhat solve the
problem that current micro builds using stack for tensors < 1024 instead of the
workspace buffer provided, maybe we should just make kMaxStackAllocaSize=0 (a
binary decision rather than a value range).

@Mousius @leandron @areusch , this means there is going to be another
argument for a simple micro deployment to be added to the already long list of
arguments. Something like "--use-external-workspace" ?

@tqchen , I still feel it would have been super helpful that
kMaxStackAllocaSize is by default zero but with the option of going higher
based on a user argument. e.g. --max-stack-alloca-size=1024. It is not very
convincing that we are leaving out stack allocation of tensors with the
prospect of being optimized by mem2reg without doing any marking (i.e. special
storage scope) in a somewhat global form.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] manupa-arm commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

Reply via email to