joocer commented on issue #35389:
URL: https://github.com/apache/arrow/issues/35389#issuecomment-1533020561

   No worries - here's a contrived snippet to demonstrate:
   
   ~~~python
   import pyarrow
   
   movie_vampires = pyarrow.Table.from_pydict(
       {
           "Movie": ["Twilight", "Interview with the Vampire", "Dracula", 
"Blade", "Underworld"],
           "Vampire": ["Edward Cullen", "Lestat de Lioncourt", "Count Dracula", 
"Blade", "Selene"],
       }
   )
   
   actors = pyarrow.Table.from_pydict(
       {
           "Character": ["Edward Cullen", "Lestat de Lioncourt", "Count 
Dracula", "Blade", "Selene"],
           "Actor": ["Robert Pattinson", "Tom Cruise", "Gary Oldman", "Wesley 
Snipes", "Kate Beckinsale"],
       }
   )
   
   movie_actors = movie_vampires.join(
       actors,
       keys=["Vampire"],
       right_keys=["Character"],
       join_type="inner",
       coalesce_keys=False,
   )
   
   print(movie_actors)
   ~~~
   
   In pyarrow 11, this is the result (note the four columns):
   
   ~~~
   Movie: string
   Vampire: string
   Character: string
   Actor: string
   ----
   Movie: [["Twilight","Interview with the 
Vampire","Dracula","Blade","Underworld"]]
   Vampire: [["Edward Cullen","Lestat de Lioncourt","Count 
Dracula","Blade","Selene"]]
   Character: [["Edward Cullen","Lestat de Lioncourt","Count 
Dracula","Blade","Selene"]]
   Actor: [["Robert Pattinson","Tom Cruise","Gary Oldman","Wesley Snipes","Kate 
Beckinsale"]]
   ~~~
   
   in pyarrow 12 this is the result (note only three columns)
   
   ~~~
   Movie: string
   Vampire: string
   Actor: string
   ----
   Movie: [["Twilight","Interview with the 
Vampire","Dracula","Blade","Underworld"]]
   Vampire: [["Edward Cullen","Lestat de Lioncourt","Count 
Dracula","Blade","Selene"]]
   Actor: [["Robert Pattinson","Tom Cruise","Gary Oldman","Wesley Snipes","Kate 
Beckinsale"]]
   ~~~
   
   This output is from a box with Python 3.10.7 on Debian Buster x86 64bit, but 
it appears to happen on all the Python versions and OSes I've tried.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to