nealrichardson opened a new pull request #11155:
URL: https://github.com/apache/arrow/pull/11155
This is based on #11150. The R test currently sometimes segfaults inside the
join wrapper:
```
*** caught segfault ***
address 0x17, cause 'invalid permissions'
Traceback:
1: .Call(`_arrow_ExecNode_Join`, input, type, right_data, left_keys,
right_keys, left_output, right_output)
```
sometimes just hangs, and sometimes fails with a cryptic `vector` error
message that seems to be coming from the R bindings. I have not successfully
evaluated a join, but I have sometimes made a join node (I suspect the hanging
comes from when I run the ExecPlan with the join)
Among the issues observed, in addition to the behavior above:
* Dictionary columns aren't allowed even in the left data, and you can't
first `Project` to remove them--that still is rejected.
* Duplicate column names aren't allowed at all, even though there is a
provision for deduping with a prefix
* This duplicate column restriction also applies to the join keys, which
very much may have the same names and should (usually) not be duplicated in the
result
* It appears that the join node does not expose an `output_schema()` like
other nodes do--it's empty. This was causing Project after Join to fail (I
hacked around that in R just to see if I could get it to run) and possibly
could be related to other problems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]