yihua commented on PR #13414:
URL: https://github.com/apache/hudi/pull/13414#issuecomment-2977807399

   > @yihua asked for PR owner @Davis-Zhang-Onehouse to work on the following 
AI:
   > 
   > > Index definition looks better. sth to store as part of the table 
metadata, instead of implicitly inferring it. We should store something in the 
.index.def.
   > 
   > # Proposed change
   > Introducing a new attribute "partitionStrategy". Possible values:
   > 
   > * HASH_ON_SECONDARY_KEY: exclusive to SI. It means we can determine SI 
file group by computing the hash value only using the secondary data column 
value.
   > * HASH_ON_SECONDARY_KEY_RECORD_KEY: exclusive to SI. It means the old 
behavior.
   > * For other indexes we will also come up with proper enum values for their 
existing strategies.
   > 
   > <img alt="image" width="1695" 
src="https://private-user-images.githubusercontent.com/169106455/455664353-c9f69b5c-1ecf-4899-a5c0-2cbe6290712c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTAxMDE1NTYsIm5iZiI6MTc1MDEwMTI1NiwicGF0aCI6Ii8xNjkxMDY0NTUvNDU1NjY0MzUzLWM5ZjY5YjVjLTFlY2YtNDg5OS1hNWMwLTJjYmU2MjkwNzEyYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYxNlQxOTE0MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mOTBhNDM1ZmNmYjIwNDFkZjQ0MWYxYjE1YTQ4YmYwNzQ4MmIxZThkM2Y3ZGFmOGU1NzkzMzYwNTgzOTk5MGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.89fuFfkvPs9K_AL7c5MGrJ-2nCggjjL_lu4T-mVo0Oc";>
   > Also from the schema evolution perspective about this json. If we didn't 
find such field, depends on the index type, we infer the default strategy. For 
SI, in case we could not find the value, it means using 
`HASH_ON_SECONDARY_KEY_RECORD_KEY`.
   > 
   > # Implementation details
   > ## Read path
   > Whenever we do lookup using indexes, we will read the index def file and 
add partitionStrategy value to the context. If it is 
HASH_ON_SECONDARY_KEY_RECORD_KEY, it will route to prefix lookup code. If it is 
HASH_ON_SECONDARY_KEY, it is route to the new code
   > 
   > Also for all the other indexes, we need to do the same as they share the 
same code path. It is an index look up interface change so it impacts code 
shared by all indexes.
   > 
   > ## Write path
   > If it is creating a new interface, we always write partitionStrategy 
attribute in the strategy. For SI it will be of value 
HASH_ON_SECONDARY_KEY_RECORD_KEY. For others, we will come up name mapping to 
their existing behavior.
   > 
   > If it is updating existing indexes due to change of data, follow the "Read 
path" logic.
   > 
   > ## Misc
   > revert the table version related code change.
   > 
   > @yihua please comment on the proposed plan when you get a chance. thanks
   
   Looks good to me overall.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to