korbit-ai[bot] commented on code in PR #32857:
URL: https://github.com/apache/superset/pull/32857#discussion_r2013692516


##########
superset/db_engine_specs/drill.py:
##########
@@ -144,3 +145,14 @@ def fetch_data(
             if str(ex) == "generator raised StopIteration":
                 return []
             raise
+
+    @staticmethod
+    def _mutate_label(label: str) -> str:
+        """
+        Suffix with the first six characters from the md5 of the label to avoid
+        collisions with original column names
+
+        :param label: Expected expression label
+        :return: Conditionally mutated label
+        """
+        return f"{label}_{md5_sha_from_str(label)[:6]}"

Review Comment:
   ### Insufficient Hash Length for Collision Prevention <sub>![category 
Functionality](https://img.shields.io/badge/Functionality-0284c7)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The _mutate_label method could potentially generate collisions if two 
different labels produce MD5 hashes that share the same first 6 characters.
   
   ###### Why this matters
   While rare, MD5 hash collisions in the first 6 characters could lead to the 
same functionality issue the code is trying to solve - duplicate column names 
causing query errors.
   
   ###### Suggested change ∙ *Feature Preview*
   Increase the number of characters used from the hash to reduce collision 
probability. A recommended solution would be to use at least 8 characters:
   ```python
   return f"{label}_{md5_sha_from_str(label)[:8]}"
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/07964f89-9946-448c-95e6-dd5de52d1d8c/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/07964f89-9946-448c-95e6-dd5de52d1d8c?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/07964f89-9946-448c-95e6-dd5de52d1d8c?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/07964f89-9946-448c-95e6-dd5de52d1d8c?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/07964f89-9946-448c-95e6-dd5de52d1d8c)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:d50c8dfe-a8bb-40cb-ac29-5c9c81129991 -->
   
   [](d50c8dfe-a8bb-40cb-ac29-5c9c81129991)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to