lildmh marked an inline comment as done.
lildmh added inline comments.
================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739
+/// // For each component specified by this mapper:
+/// if (currentComponent.hasMapper())
+/// (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin,
----------------
ABataev wrote:
> lildmh wrote:
> > ABataev wrote:
> > > lildmh wrote:
> > > > ABataev wrote:
> > > > > lildmh wrote:
> > > > > > ABataev wrote:
> > > > > > > lildmh wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > lildmh wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > > > lildmh wrote:
> > > > > > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > Currently `currentComponent`
> > > > > > > > > > > > > > > > > > > > > > > > > is generated by the compiler.
> > > > > > > > > > > > > > > > > > > > > > > > > But can we instead pass this
> > > > > > > > > > > > > > > > > > > > > > > > > data as an extra parameter to
> > > > > > > > > > > > > > > > > > > > > > > > > this `omp_mapper` function.
> > > > > > > > > > > > > > > > > > > > > > > > Emm, I think this scheme will
> > > > > > > > > > > > > > > > > > > > > > > > be very difficult and
> > > > > > > > > > > > > > > > > > > > > > > > inefficient. If we pass
> > > > > > > > > > > > > > > > > > > > > > > > components as an argument of
> > > > > > > > > > > > > > > > > > > > > > > > `omp_mapper` function, it means
> > > > > > > > > > > > > > > > > > > > > > > > that the runtime needs to
> > > > > > > > > > > > > > > > > > > > > > > > generate all components related
> > > > > > > > > > > > > > > > > > > > > > > > to a map clause. I don't think
> > > > > > > > > > > > > > > > > > > > > > > > the runtime is able to do that
> > > > > > > > > > > > > > > > > > > > > > > > efficiently. On the other hand,
> > > > > > > > > > > > > > > > > > > > > > > > in the current scheme, these
> > > > > > > > > > > > > > > > > > > > > > > > components are naturally
> > > > > > > > > > > > > > > > > > > > > > > > generated by the compiler, and
> > > > > > > > > > > > > > > > > > > > > > > > the runtime only needs to know
> > > > > > > > > > > > > > > > > > > > > > > > the base pointer, pointer,
> > > > > > > > > > > > > > > > > > > > > > > > type, size. etc.
> > > > > > > > > > > > > > > > > > > > > > > With the current scheme, we may
> > > > > > > > > > > > > > > > > > > > > > > end with the code blowout. We
> > > > > > > > > > > > > > > > > > > > > > > need to generate very similar
> > > > > > > > > > > > > > > > > > > > > > > code for different types and
> > > > > > > > > > > > > > > > > > > > > > > variables. The worst thing here
> > > > > > > > > > > > > > > > > > > > > > > is that we will be unable to
> > > > > > > > > > > > > > > > > > > > > > > optimize this huge amount of code
> > > > > > > > > > > > > > > > > > > > > > > because the codegen relies on the
> > > > > > > > > > > > > > > > > > > > > > > runtime functions and the code
> > > > > > > > > > > > > > > > > > > > > > > cannot be inlined. That's why I
> > > > > > > > > > > > > > > > > > > > > > > would like to move as much as
> > > > > > > > > > > > > > > > > > > > > > > possible code to the runtime
> > > > > > > > > > > > > > > > > > > > > > > rather than to emit it in the
> > > > > > > > > > > > > > > > > > > > > > > compiler.
> > > > > > > > > > > > > > > > > > > > > > I understand your concerns. I think
> > > > > > > > > > > > > > > > > > > > > > this is the best we can do right
> > > > > > > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > The most worrisome case will be
> > > > > > > > > > > > > > > > > > > > > > when we have nested mappers within
> > > > > > > > > > > > > > > > > > > > > > each other. In this case, a mapper
> > > > > > > > > > > > > > > > > > > > > > function will call another mapper
> > > > > > > > > > > > > > > > > > > > > > function. We can inline the inner
> > > > > > > > > > > > > > > > > > > > > > mapper functions in this scenario,
> > > > > > > > > > > > > > > > > > > > > > so that these mapper function can
> > > > > > > > > > > > > > > > > > > > > > be properly optimized. As a result,
> > > > > > > > > > > > > > > > > > > > > > I think the performance should be
> > > > > > > > > > > > > > > > > > > > > > fine.
> > > > > > > > > > > > > > > > > > > > > Instead, we can use indirect function
> > > > > > > > > > > > > > > > > > > > > calls passed in the array to the
> > > > > > > > > > > > > > > > > > > > > runtime. Do you think it is going to
> > > > > > > > > > > > > > > > > > > > > be slower? In your current scheme, we
> > > > > > > > > > > > > > > > > > > > > generate many runtime calls instead.
> > > > > > > > > > > > > > > > > > > > > Could you try to estimate the number
> > > > > > > > > > > > > > > > > > > > > of calls in cases if we'll call the
> > > > > > > > > > > > > > > > > > > > > mappers through the indirect function
> > > > > > > > > > > > > > > > > > > > > calls and in your cuurent scheme,
> > > > > > > > > > > > > > > > > > > > > where we need to call the runtime
> > > > > > > > > > > > > > > > > > > > > functions many times in each
> > > > > > > > > > > > > > > > > > > > > particular mapper?
> > > > > > > > > > > > > > > > > > > > Hi Alexey,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Sorry I don't understand your idea.
> > > > > > > > > > > > > > > > > > > > What indirect function calls do you
> > > > > > > > > > > > > > > > > > > > propose to be passed to the runtime?
> > > > > > > > > > > > > > > > > > > > What are these functions supposed to do?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The number of function calls will be
> > > > > > > > > > > > > > > > > > > > exactly equal to the number of
> > > > > > > > > > > > > > > > > > > > components mapped, no matter whether
> > > > > > > > > > > > > > > > > > > > there are nested mappers or not. The
> > > > > > > > > > > > > > > > > > > > number of components depend on the
> > > > > > > > > > > > > > > > > > > > program. E.g., if we map a large array
> > > > > > > > > > > > > > > > > > > > section, then there will be many more
> > > > > > > > > > > > > > > > > > > > function calls.
> > > > > > > > > > > > > > > > > > > I mean the pointers to the mapper
> > > > > > > > > > > > > > > > > > > function, generated by the compiler. In
> > > > > > > > > > > > > > > > > > > your comment, it is `c.Mapper()`
> > > > > > > > > > > > > > > > > > If we pass nested mapper functions to the
> > > > > > > > > > > > > > > > > > runtime, I think it will slow down
> > > > > > > > > > > > > > > > > > execution because of the extra level of
> > > > > > > > > > > > > > > > > > indirect function calls. E.g., the runtime
> > > > > > > > > > > > > > > > > > will call `omp_mapper1`, which calls the
> > > > > > > > > > > > > > > > > > runtime back, which calls `omp_mapper2`,
> > > > > > > > > > > > > > > > > > .... This can result in a deep call stack.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think the current implementation will be
> > > > > > > > > > > > > > > > > > more efficient, which doesn't pass nested
> > > > > > > > > > > > > > > > > > mappers to the runtime. One call to the
> > > > > > > > > > > > > > > > > > outer most mapper function will have all
> > > > > > > > > > > > > > > > > > data mapping done. The call stack will be 2
> > > > > > > > > > > > > > > > > > level deep (the first level is the mapper
> > > > > > > > > > > > > > > > > > function, and the second level is
> > > > > > > > > > > > > > > > > > `__tgt_push_mapper_component`) in this case
> > > > > > > > > > > > > > > > > > from the runtime. There are also more
> > > > > > > > > > > > > > > > > > compiler optimization space when we inline
> > > > > > > > > > > > > > > > > > all nested mapper functions.
> > > > > > > > > > > > > > > > > Yes, if we leave it as is. But if instead of
> > > > > > > > > > > > > > > > > the bunch unique functions we'll have the
> > > > > > > > > > > > > > > > > common one, that accept list if indirect
> > > > > > > > > > > > > > > > > pointers to functions additionally, and move
> > > > > > > > > > > > > > > > > it to the runtime library, we won't need
> > > > > > > > > > > > > > > > > those 2 functions we have currently. We'll
> > > > > > > > > > > > > > > > > have full access to the mapping data vector
> > > > > > > > > > > > > > > > > in the runtime library and won't need to use
> > > > > > > > > > > > > > > > > those 2 accessors we have currently. Instead,
> > > > > > > > > > > > > > > > > we'll need just one runtime functions, which
> > > > > > > > > > > > > > > > > implements the whole mapping logic. We still
> > > > > > > > > > > > > > > > > need to call it recursively, but I assume the
> > > > > > > > > > > > > > > > > number of calls will remain the same as in
> > > > > > > > > > > > > > > > > the current scheme. Did you understand the
> > > > > > > > > > > > > > > > > idea? If yes, it would good if you coild try
> > > > > > > > > > > > > > > > > to estimate the number of function calls in
> > > > > > > > > > > > > > > > > current scheme and in this new scheme to
> > > > > > > > > > > > > > > > > estimate possible pros and cons.
> > > > > > > > > > > > > > > > Hi Alexey,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could you give an example for this scheme? 1) I
> > > > > > > > > > > > > > > > don't understand how the mapper function can
> > > > > > > > > > > > > > > > have full access to the mapping data vector
> > > > > > > > > > > > > > > > without providing these 2 accessors. 2) I don't
> > > > > > > > > > > > > > > > think it is possible to have a common function
> > > > > > > > > > > > > > > > instead of bunch of unique functions for each
> > > > > > > > > > > > > > > > mapper declared.
> > > > > > > > > > > > > > > Hi Lingda, something like this.
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > void __tgt_mapper(void *base, void *begin, size_t
> > > > > > > > > > > > > > > size, int64_t type, auto components[]) {
> > > > > > > > > > > > > > > // Allocate space for an array section first.
> > > > > > > > > > > > > > > if (size > 1 && !maptype.IsDelete)
> > > > > > > > > > > > > > > <push>(base, begin, size*sizeof(Ty),
> > > > > > > > > > > > > > > clearToFrom(type));
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > // Map members.
> > > > > > > > > > > > > > > for (unsigned i = 0; i < size; i++) {
> > > > > > > > > > > > > > > // For each component specified by this
> > > > > > > > > > > > > > > mapper:
> > > > > > > > > > > > > > > for (auto c : components) {
> > > > > > > > > > > > > > > if (c.hasMapper())
> > > > > > > > > > > > > > > (*c.Mapper())(c.arg_base, c.arg_begin,
> > > > > > > > > > > > > > > c.arg_size, c.arg_type);
> > > > > > > > > > > > > > > else
> > > > > > > > > > > > > > > <push>(c.arg_base, c.arg_begin,
> > > > > > > > > > > > > > > c.arg_size, c.arg_type);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > // Delete the array section.
> > > > > > > > > > > > > > > if (size > 1 && maptype.IsDelete)
> > > > > > > > > > > > > > > <push>(base, begin, size*sizeof(Ty),
> > > > > > > > > > > > > > > clearToFrom(type));
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > void <type>.mapper(void *base, void *begin,
> > > > > > > > > > > > > > > size_t size, int64_t type) {
> > > > > > > > > > > > > > > auto sub_components[] = {...};
> > > > > > > > > > > > > > > __tgt_mapper(base, begin, size, type,
> > > > > > > > > > > > > > > sub_components);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Alexey,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think this scheme is more efficient than
> > > > > > > > > > > > > > the current scheme. My reasons are:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) Most code here is essentially to generate
> > > > > > > > > > > > > > `components`, i.e., we need to generate
> > > > > > > > > > > > > > `c.arg_base, c.arg_begin, c.arg_size, c.arg_type`
> > > > > > > > > > > > > > for each `c` in `components`, so there will still
> > > > > > > > > > > > > > be a lot of code in `<type>.mapper`. It will not
> > > > > > > > > > > > > > reduce the mapper function code, i.e., we will
> > > > > > > > > > > > > > still have a bunch of unique mapper functions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2) This scheme will prevent a lot of compiler
> > > > > > > > > > > > > > optimization from happening. In reality, a lot of
> > > > > > > > > > > > > > computation should be redundant. E.g., for two
> > > > > > > > > > > > > > components `c1` and `c2`, `c1`'s base may be the
> > > > > > > > > > > > > > same as `c2`'s begin, so the compiler will be able
> > > > > > > > > > > > > > to eliminate these reduction computation,
> > > > > > > > > > > > > > especially when we inline all nested mapper
> > > > > > > > > > > > > > functions together. If we move these computation
> > > > > > > > > > > > > > into the runtime, the compiler will not be able to
> > > > > > > > > > > > > > do such optimization.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3) In terms of the number of `push` function calls,
> > > > > > > > > > > > > > this scheme has the exact same number of calls as
> > > > > > > > > > > > > > the current scheme, so I don't think this scheme
> > > > > > > > > > > > > > can bring performance benefits. The scheme should
> > > > > > > > > > > > > > perform worse than the current scheme, because it
> > > > > > > > > > > > > > reduces the opportunities of compiler optimization
> > > > > > > > > > > > > > as mentioned above.
> > > > > > > > > > > > > Hi Lingda, I'm trying to simplify the code generated
> > > > > > > > > > > > > by clang and avoid some unnecessary code
> > > > > > > > > > > > > duplications. If the complexity of this scheme is the
> > > > > > > > > > > > > same as proposed by you, I would prefer to use this
> > > > > > > > > > > > > scheme unless there are some other opinions.
> > > > > > > > > > > > > 1. It is not a problem. This code is unique and is
> > > > > > > > > > > > > not duplicated in the different mappers.
> > > > > > > > > > > > > 2. Inlining is no solution here. We still generate to
> > > > > > > > > > > > > much code, which is almost the same in many cases and
> > > > > > > > > > > > > it will lead to very ineffective codegen because we
> > > > > > > > > > > > > still end up with a lot of almost the same code. This
> > > > > > > > > > > > > also might lead to poor performance.
> > > > > > > > > > > > > 3. Yes, the number of pushes is always the same, in
> > > > > > > > > > > > > all possible schemes. It would be good to compare
> > > > > > > > > > > > > somehow the performance of both schemes, at least
> > > > > > > > > > > > > preliminary.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also, this solution reduces the number of required
> > > > > > > > > > > > > runtime functions, instead of 2 we need just 1 and,
> > > > > > > > > > > > > thus, we need to make fewer runtime functions calls.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it would better to propose this scheme as an
> > > > > > > > > > > > > alternate design and discuss it in the OpenMP
> > > > > > > > > > > > > telecon. What do you think? Or we can try to discuss
> > > > > > > > > > > > > it in the offline mode via the e-mail with other
> > > > > > > > > > > > > members.
> > > > > > > > > > > > > I'm not trying to convince you to implement this
> > > > > > > > > > > > > scheme right now, but it would be good to discuss it.
> > > > > > > > > > > > > Maybe it will lead to some better ideas from others?
> > > > > > > > > > > > Hi Alexey,
> > > > > > > > > > > >
> > > > > > > > > > > > I still prefer the current scheme, because:
> > > > > > > > > > > > 1) I don't like recursive mapper calls, which goes back
> > > > > > > > > > > > to my original scheme a little bit. I really think
> > > > > > > > > > > > inlining can make a big difference when we have nested
> > > > > > > > > > > > mappers. These compiler optimizations are the keys to
> > > > > > > > > > > > have better performance for mappers.
> > > > > > > > > > > > 2) I don't think the codegen here is inefficient. Yes
> > > > > > > > > > > > there is duplicated code across different mapper
> > > > > > > > > > > > functions, but why that will lead to poor performance?
> > > > > > > > > > > > 3) Although we have 2 runtime functions now, the
> > > > > > > > > > > > `__tgt_mapper_num_components` is called only once per
> > > > > > > > > > > > mapper. It should have very negligible performance
> > > > > > > > > > > > impact.
> > > > > > > > > > > >
> > > > > > > > > > > > But if you have a different option, we can discuss it
> > > > > > > > > > > > next time in the meeting. I do have a time constraint
> > > > > > > > > > > > to work on the mapper implementation. I'll no longer
> > > > > > > > > > > > work in this project starting this September, and I
> > > > > > > > > > > > have about 30% of my time working on it until then.
> > > > > > > > > > > Lingda,
> > > > > > > > > > > 1. We have recursive (actually, not recursive, because
> > > > > > > > > > > you cannot use types recursively) mappers calls anyway,
> > > > > > > > > > > it is nature of struсtures/classes.
> > > > > > > > > > > 2. We have a lot of similar code. And I'm not sure that
> > > > > > > > > > > it can be optimized out.
> > > > > > > > > > > 3. Yes, but it means that we have n extra runtime calls,
> > > > > > > > > > > where n is the number of branches in the structure/class
> > > > > > > > > > > tree.
> > > > > > > > > > >
> > > > > > > > > > > I see :(. I understand your concern. In this case, we
> > > > > > > > > > > could try to discuss it offline, in the mailing list, to
> > > > > > > > > > > make it a little bit faster. We just need to hear other
> > > > > > > > > > > opinions on this matter, maybe there are some other pros
> > > > > > > > > > > and cons for these schemes.
> > > > > > > > > > Hi Alexey,
> > > > > > > > > >
> > > > > > > > > > Sure, let's discuss this in the mailing list. I'll
> > > > > > > > > > summarize it and send it to the mailing list later.
> > > > > > > > > >
> > > > > > > > > > > We have recursive (actually, not recursive, because you
> > > > > > > > > > > cannot use types recursively) mappers calls anyway, it is
> > > > > > > > > > > nature of struсtures/classes.
> > > > > > > > > > We won't have recursive calls with inlining.
> > > > > > > > > >
> > > > > > > > > > > We have a lot of similar code. And I'm not sure that it
> > > > > > > > > > > can be optimized out.
> > > > > > > > > > I think it's even harder to optimized these code out when
> > > > > > > > > > we move them into the runtime.
> > > > > > > > > >
> > > > > > > > > > > Yes, but it means that we have n extra runtime calls,
> > > > > > > > > > > where n is the number of branches in the structure/class
> > > > > > > > > > > tree.
> > > > > > > > > > I don't quite understand. It's still equal to the number of
> > > > > > > > > > mappers in any case.
> > > > > > > > > > Sure, let's discuss this in the mailing list. I'll
> > > > > > > > > > summarize it and send it to the mailing list later.
> > > > > > > > >
> > > > > > > > > Good, thanks!
> > > > > > > > >
> > > > > > > > > > We won't have recursive calls with inlining.
> > > > > > > > >
> > > > > > > > > We won't have recursive calls anyway (recursive types are not
> > > > > > > > > allowed). Plus, I'm not sure that inlining is the best option
> > > > > > > > > here. We have a lot of code for each mapper and I'm not sure
> > > > > > > > > that the optimizer will be able to squash it effectively.
> > > > > > > > >
> > > > > > > > > > I think it's even harder to optimized these code out when
> > > > > > > > > > we move them into the runtime.
> > > > > > > > >
> > > > > > > > > Definitely not, unless we use LTO or inlined runtime.
> > > > > > > > >
> > > > > > > > > We won't have recursive calls anyway (recursive types are not
> > > > > > > > > allowed). Plus, I'm not sure that inlining is the best option
> > > > > > > > > here. We have a lot of code for each mapper and I'm not sure
> > > > > > > > > that the optimizer will be able to squash it effectively.
> > > > > > > > Sorry I should not say recursive calls. Here it needs to
> > > > > > > > "recursively" call other mapper functions in case of nested
> > > > > > > > mappers, but we don't need it in case of inlining.
> > > > > > > >
> > > > > > > > > Definitely not, unless we use LTO or inlined runtime.
> > > > > > > > But you are proposing to move many code to the runtime here,
> > > > > > > > right? That doesn't make sense to me.
> > > > > > > >
> > > > > > > > But you are proposing to move much code to the runtime here,
> > > > > > > > right? That doesn't make sense to me.
> > > > > > >
> > > > > > > I'm just not sure that there going be significant problems with
> > > > > > > the performance because of that. And it significantly simplifies
> > > > > > > codegen in the compiler and moves the common part into a single
> > > > > > > function.
> > > > > > >
> > > > > > > Plus, if in future we'll need to modify this functionality for
> > > > > > > some reason, 2 different versions of the compiler will produce
> > > > > > > incompatible code. With my scheme, you still can use old runtime
> > > > > > > and have the same functionality as the old compiler and the new
> > > > > > > one.
> > > > > > Hi Alexey,
> > > > > >
> > > > > > I think more carefully about your scheme, and I don't think we can
> > > > > > solve the 2 problems below with this scheme:
> > > > > >
> > > > > > 1. In the example you gave before, the compiler needs to generate
> > > > > > all map types and pass them to `__tgt_mapper` through
> > > > > > `sub_components`. But in this case, the compiler won't be able to
> > > > > > generate the correct `MEMBER_OF` field in map type. As a result,
> > > > > > the runtime has to fix it using the mechanism we already have here:
> > > > > > `__tgt_mapper_num_components`. This not only increases complexity,
> > > > > > but also, it means the runtime needs further manipulation of the
> > > > > > map type, which creates locality issues. While in the current
> > > > > > scheme, the map type is generated by compiler once, so the data
> > > > > > locality will be very good in this case.
> > > > > >
> > > > > > 2. `sub_components` includes all components that should be mapped.
> > > > > > If we are mapping an array, this means we need to map many
> > > > > > components, which will need to allocate memory for `sub_components`
> > > > > > in the heap. This creates further memory management burden and is
> > > > > > not an efficient way to use memory.
> > > > > >
> > > > > > Based on these reasons, I think the current scheme is still more
> > > > > > preferable.
> > > > > Hi Lingda,
> > > > > 1. Actually, I thought that the runtime function `__tgt_mapper` will
> > > > > do this, not the compiler.
> > > > > 2. Why do we need to allocate it on the heap? We can allocate it on
> > > > > the stack.
> > > > 1. In your scheme, both compiler and `__tgt_mapper` need to do this:
> > > > the compiler will generate other parts in the type, e.g., `TO` `FROM`
> > > > bits and basic `MEMBER_OF` bits. Then `__tgt_mapper` needs to modify
> > > > the `MEMBER_OF` bits later. Since there are a lot of other memory
> > > > accesses between the compiler and `__tgt_mapper` operations to the same
> > > > map type, it's very likely the map type will not stay in the cache,
> > > > which causes locality problem.
> > > >
> > > > 2. Assume we are mapping an array with 1000000 elements, and each
> > > > elements have 5 components. For each component, we need `base,
> > > > begin_ptr, size, type, mapper`, which are 40 bytes. Together, we will
> > > > need 1000000 * 5 * 40 = 200MB of space for this array, which stack
> > > > cannnot handle.
> > > 1. I don't think it is a big problem, this part of the code is executed
> > > on the CPU and I don't think it will lead to significant overhead.
> > > 2. When we map an array, we do not map it element-by-element, so we don't
> > > need 10000 records. Moreover, we try to merge contiguous parts into
> > > single one, reducing the total number of elements.
> > 1. I think it is a problem. Beside, doing so is not elegant: why having a
> > single thing (setting the map type) done in 2 places while we can do it in
> > one place?
> >
> > 2. We need to map them element by element because it is not always possible
> > to merge contiguous parts together (there may be no contiguous parts based
> > on the mapper). And merging parts together will be a complex procedure: I
> > don't think it can be done in the runtime because many code is moved into
> > the runtime now. In contrast, the compiler will have better opportunities
> > to merge things.
> >
> > Besides, I don't think there is a valid reason that the current scheme is
> > not good. You mentioned it's complex codegen. But it only has less 200 loc
> > here, I don't see why it is complex.
> 2. I rather doubt that we will need to map a record with 100000 fields
> element-by-element.
>
> I think it would be better to share it with others and listen to their
> opinions. It is better to spend some extra time to provide good design.
> You can include your doubts in the description of the new scheme, of course.
2. The implementation should work for any case anyway. Besides, I think mapping
a large array should be actually a common case. The users don't want to map
1000000 elements by themselves, so they will want to use mapper to let the
system do it automatically.
Sure, I can release the discussion to the mailing list. I don't see a reason to
use the new scheme now.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59474/new/
https://reviews.llvm.org/D59474
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits