prashantwason commented on pull request #2701:
URL: https://github.com/apache/hudi/pull/2701#issuecomment-873204691


   Some high level comments:
   
   1. There are now three versions in a HUDI dataset - Table Version, 
TimelineLayout Version, Instant Format version. 
      1. It may be good to consolidate them (atleast the TimelineLayout version 
and InstantFormat as they have a 1:1 relationship). 
      2. Updating the TimelineLayout Version each time a new Instant Format 
version is needed works.
   
   2. We need to standardize on the naming - the filter functions use 
finish-timestamp (filterByFinishTs) and other parts of code use 
transition-timestamp (getStateTransitionTime, stateTransitionTime) 
       1. As implemented, there is a single transition - start to finish. It 
may be best to standardize on everything being a transition Start -> inflight 
-> finish (for simplicity). This also make it easier to include other states in 
the middle.
       2. Finish (or completion) then is the final state transition
   
   3. How should one decide which filtering to use? By default the filtering is 
startTs of instants but some places we filter by finishTs. I feel there should 
be some default everyone can use without having to understand the internals of 
timeline. 
       1. E.g. HoodieMetaClient.getActiveTimeline() -- this should return the 
default which is used everywhere (sorted by EndTs?)
       2. HoodieMetaClient.getCompletionTimeline() or 
HoodieMetaClient.getActiveTimeline(boolean sortbyEndTs)  -- for specific uses 
where the caller needs explicit ordering.
   
   4. InstantTime seems overall to save two long timestamps.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to