prashantwason commented on pull request #2701:
URL: https://github.com/apache/hudi/pull/2701#issuecomment-873204691
Some high level comments:
1. There are now three versions in a HUDI dataset - Table Version,
TimelineLayout Version, Instant Format version.
1. It may be good to consolidate them (atleast the TimelineLayout version
and InstantFormat as they have a 1:1 relationship).
2. Updating the TimelineLayout Version each time a new Instant Format
version is needed works.
2. We need to standardize on the naming - the filter functions use
finish-timestamp (filterByFinishTs) and other parts of code use
transition-timestamp (getStateTransitionTime, stateTransitionTime)
1. As implemented, there is a single transition - start to finish. It
may be best to standardize on everything being a transition Start -> inflight
-> finish (for simplicity). This also make it easier to include other states in
the middle.
2. Finish (or completion) then is the final state transition
3. How should one decide which filtering to use? By default the filtering is
startTs of instants but some places we filter by finishTs. I feel there should
be some default everyone can use without having to understand the internals of
timeline.
1. E.g. HoodieMetaClient.getActiveTimeline() -- this should return the
default which is used everywhere (sorted by EndTs?)
2. HoodieMetaClient.getCompletionTimeline() or
HoodieMetaClient.getActiveTimeline(boolean sortbyEndTs) -- for specific uses
where the caller needs explicit ordering.
4. InstantTime seems overall to save two long timestamps.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]