joocer commented on issue #35389:
URL: https://github.com/apache/arrow/issues/35389#issuecomment-1533020561
No worries - here's a contrived snippet to demonstrate:
~~~python
import pyarrow
movie_vampires = pyarrow.Table.from_pydict(
{
"Movie": ["Twilight", "Interview with the Vampire", "Dracula",
"Blade", "Underworld"],
"Vampire": ["Edward Cullen", "Lestat de Lioncourt", "Count Dracula",
"Blade", "Selene"],
}
)
actors = pyarrow.Table.from_pydict(
{
"Character": ["Edward Cullen", "Lestat de Lioncourt", "Count
Dracula", "Blade", "Selene"],
"Actor": ["Robert Pattinson", "Tom Cruise", "Gary Oldman", "Wesley
Snipes", "Kate Beckinsale"],
}
)
movie_actors = movie_vampires.join(
actors,
keys=["Vampire"],
right_keys=["Character"],
join_type="inner",
coalesce_keys=False,
)
print(movie_actors)
~~~
In pyarrow 11, this is the result (note the four columns):
~~~
Movie: string
Vampire: string
Character: string
Actor: string
----
Movie: [["Twilight","Interview with the
Vampire","Dracula","Blade","Underworld"]]
Vampire: [["Edward Cullen","Lestat de Lioncourt","Count
Dracula","Blade","Selene"]]
Character: [["Edward Cullen","Lestat de Lioncourt","Count
Dracula","Blade","Selene"]]
Actor: [["Robert Pattinson","Tom Cruise","Gary Oldman","Wesley Snipes","Kate
Beckinsale"]]
~~~
in pyarrow 12 this is the result (note only three columns)
~~~
Movie: string
Vampire: string
Actor: string
----
Movie: [["Twilight","Interview with the
Vampire","Dracula","Blade","Underworld"]]
Vampire: [["Edward Cullen","Lestat de Lioncourt","Count
Dracula","Blade","Selene"]]
Actor: [["Robert Pattinson","Tom Cruise","Gary Oldman","Wesley Snipes","Kate
Beckinsale"]]
~~~
This output is from a box with Python 3.10.7 on Debian Buster x86 64bit, but
it appears to happen on all the Python versions and OSes I've tried.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]