Re: [I] Improve substrait NameTracker so it doesn't require uuids [datafusion]

via GitHub Tue, 25 Nov 2025 03:08:22 -0800


LiaCastaneda commented on issue #17508:
URL: https://github.com/apache/datafusion/issues/17508#issuecomment-3575072353


   Sorry I think I misunderstood the issue, I graphed this to better picture 
this, please let me know if I'm understanding this wrong
   
   `SELECT B.C, CAST(B.C as TEXT) FROM table;
   ` yields to the following structure 
   
   ```
   
   ╔═══════════════════╦══════════════╦══════════════╦═══════════════╗
   ║   Expression      ║ Qualifier    ║ qualified_   ║ schema_name() ║
   ║                   ║ .0           ║ name().1     ║               ║
   ╠═══════════════════╬══════════════╬══════════════╬═══════════════╣
   ║ Column(B.C)       ║ Some("B")    ║ "C"          ║ "B.C"         ║
   ║                   ║ ↑            ║ ↑            ║ ↑             ║
   ║                   ║ Has table!   ║ Field only   ║ Combined      ║
   ╠═══════════════════╬══════════════╬══════════════╬═══════════════╣
   ║ CAST(B.C as Utf8) ║ None         ║ "B.C"        ║ "B.C"         ║
   ║                   ║ ↑            ║ ↑            ║ ↑             ║
   ║                   ║ No table!    ║ Full string  ║ Same string   ║
   ╚═══════════════════╩══════════════╩══════════════╩═══════════════╝
   ```
   
   The null literal situation from the issue description yields to the 
following structure (without the uuid workaround):
   
   ```
   
╔═══════════════════════════╦═══════════════╦═══════════════╦═══════════════════╗
   ║   Expression              ║ Qualifier     ║ qualified_    ║ schema_name()  
   ║
   ║                           ║ .0            ║ name().1      ║ 
(name_for_alias)  ║
   
╠═══════════════════════════╬═══════════════╬═══════════════╬═══════════════════╣
   ║ lit(NULL)                 ║ None          ║ "UTF8(NULL)"  ║ "UTF8(NULL)"   
   ║
   ║ (new literal in project)  ║               ║               ║                
   ║
   
╠═══════════════════════════╬═══════════════╬═══════════════╬═══════════════════╣
   ║ Column(left.UTF8(NULL))   ║ Some("left")  ║ "UTF8(NULL)"  ║ 
"left.UTF8(NULL)" ║
   ║ (from input after join)   ║               ║               ║                
   ║
   
╚═══════════════════════════╩═══════════════╩═══════════════╩═══════════════════╝
   ```
   
   The solutions proposed in the issue description -> using qualified_name().1  
instead of name_for_alias() to detect when to rename would directly fix the 
null literal case wihtout needing to use uuids. But for the cast situation the 
name tracker would never trigger a rename since it sees a different 
qualified_name().1, and because the schema_name are the same they fail later in 
`validate_unique_names`
   
   iiuc we want to solve both situations while avoiding the uuid workaround.
   
   > do we want to make it possible for two columns in the output schema to 
have the same exact qualified name? 
   
   Unless I'm missing something, I don't see the harm in allowing it if 
Postgres already does it. 
   
   > making CAST(B.C as Utf8) have qualified name ("B", "C") so that the name 
tracker can detect column B.C has an already seen name
   
   Do you know why right now the name tracker is not triggering a rename if 
both schema names are the same? (which iiuc is what `name_for_alias` returns) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve substrait NameTracker so it doesn't require uuids [datafusion]

Reply via email to