Thanks for your feedback!
Here is a quick summary about the type parameter of HoodieRecordPayload.
1. The type parameter is used to refer "self type" which is generally good
for
access subclass methods/fields without unchecked casting.
2. However, the concrete payload is accessed by `combineAndGetUpdateValue`
or `getInsertValue` which doesn't get benefit from the type parameter. BTW,
the
return type of these two methods suffers from unchecked casting also, which
is
filed as HUDI-834[1].
3. So, the only usage of the type parameter(self type) is `preCombine`
method.
Though, at the real use point we pass raw typed HoodieRecordPayload as
the parameter[2]. I'm no sure the behavior when casting failed and want to
ask for advice. So far, we cast at runtime and a cast exception will be
thrown
if failed.
Given the cases above, I tend to change the definition
of HoodieRecordPayload
as
public interface HoodieRecordPayload { }
if we can well-define how to do in `preCombine` when type mismatch. The
alternative,
i.e., a full paramterized self type type parameter will affect quite a wide
range of code
where we have to cascade change other raw usages such as
HoodieTable/HoodieRecord/HoodieWriteHandle and so on; while the self type
parameter
doesn't bring a lot benefit.
Best,
tison.
[1] https://issues.apache.org/jira/browse/HUDI-834
[2] org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:114
Vinoth Chandar <[email protected]> 于2020年4月23日周四 上午12:04写道:
> +1 raising a JIRA and summarizing some findings would be great.
>
> On Wed, Apr 22, 2020 at 8:38 AM leesf <[email protected]> wrote:
>
> > hi tison,
> >
> > It is always better to make the codebase more clear, so it would be great
> > if you would do an investigation.
> >
> > tison <[email protected]> 于2020年4月22日周三 下午2:42写道:
> >
> > > Hi Vinoth,
> > >
> > > >much of the code actually does not depend on the templatized type at
> all
> > >
> > > Agree.
> > >
> > > Let's say I'm ok with untyped HoodieRecordPayload since all payload is
> > > effectively untyped(Object) which we deal with type and cast at
> runtime.
> > >
> > > Though, it is noisy we implement it in a half-generic form. Meanwhile,
> > > wildcard doesn't work for every case since <? capture
> > HoodieRecordPayload>
> > > is NOT compatible with <? capture HoodieRecordPayload> and any
> > > exact T extends HoodieRecordPayload.
> > >
> > > We don't actually use the type parameter heavily, so it is an
> alternative
> > > that we define HoodieRecordPayload just
> > >
> > > public class HoodieRecordPayload { }
> > >
> > > If the community think it is a worth effort, I'm glad to do more
> > > investigation
> > > and evaluate its impact, also find a practical way.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Vinoth Chandar <[email protected]> 于2020年4月22日周三 下午2:22写道:
> > >
> > > > Hi Tison,
> > > >
> > > > Thanks for raising this.. In most places doing a HoodieTable<?>
> > wildcard
> > > > should be totally acceptable, since much of the code actually does
> not
> > > > depend on the templatized type at all..
> > > >
> > > > Def, worth taking another look holistically and see if we can address
> > > > this..
> > > >
> > > > My 2c.
> > > > Vinoth
> > > >
> > > >
> > > > On Mon, Apr 20, 2020 at 11:32 PM tison <[email protected]> wrote:
> > > >
> > > > > Hi developers,
> > > > >
> > > > > Recently when I went through Hudi codebase, I found some annoying
> > > warning
> > > > > 'raw use of
> > > > > parameterized class' and some other generic type related.
> > > > >
> > > > > It seems the root of this wide usage of raw type is
> > HoodieRecordPayload
> > > > > where
> > > > >
> > > > > HoodieRecordPayload<T extends HoodieRecordPayload>
> > > > >
> > > > > should have been
> > > > >
> > > > > HoodieRecordPayload<T extends HoodieRecordPayload<T>>
> > > > >
> > > > > I'm not sure what community think of this wide usage of raw type
> and
> > > > given
> > > > > it affect most of our
> > > > > code I'd like to start this thread for advice. Personally explicit
> > type
> > > > > signature benefits from strong
> > > > > type system but it might cause some signature breaking changes as
> > well
> > > as
> > > > > require a huge
> > > > > engineering effort.
> > > > >
> > > > > Shall Hudi live with these raw types? Or we will have a plan
> migrate
> > to
> > > > > explicit generic type?
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > >
> > >
> >
>