2416210017 commented on PR #4947:
URL: https://github.com/apache/seatunnel/pull/4947#issuecomment-1706296031

   Although the implementation supports string types as partitioning keys, this 
design is not very reasonable. Firstly, it affects the table_ The MD5 hash 
function is applied to each value in the name column, and the obtained hash 
value is modulo 10, followed by an absolute value. Only rows with a result 
equal to 1 will be selected.
   
   For example, the specified partition is 10:
   The actual SQL executed in the business library is:
   
   partition 1:
   SELECT * FROM (
        select * from metastore_bdc.collect_dct_table_info
   ) tt where ABS(MD5(table_name) % 10) = 1;
   
   partition 2:
   SELECT * FROM (
        select * from metastore_bdc.collect_dct_table_info
   ) tt where ABS(MD5(table_name) % 10) = 2;
   。。。
   
   
![a9a1d7da54004dbcb5a088fd91879b9](https://github.com/apache/seatunnel/assets/52597892/26d8c0bb-0322-4f72-91b8-2019464b4571)
   
   As shown in the figure, this type of query runs through the entire table in 
the business library and does not utilize index keys, resulting in no 
performance improvement.
   Suggested reference: Sqoop's method of string segmentation, digitizing 
existing Unicode characters
   
   Reference link:https://blog.csdn.net/fyhailin/article/details/79069475


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to