N Gautam Animesh created ARROW-17802:
----------------------------------------
Summary: Merging multi file datasets on particular columns that
are present in all the datasets.
Key: ARROW-17802
URL: https://issues.apache.org/jira/browse/ARROW-17802
Project: Apache Arrow
Issue Type: Improvement
Reporter: N Gautam Animesh
While working with multi file datasets, I came across an issue where I wanted
to merge specific columns from all the datasets and work on them.
Though I was not able to do so, I want to know whether there is any work around
for merging multi file datasets around some specific columns?
Please look into it and do let me know if there's anything regarding this.
{code:java}
system.time({
df <- open_dataset('C:/Test/Files/test', format = "arrow")
df <- df %>% collect() %>%
#merging logic so as to select only specified column(s)
#write_dataset(df, 'C:/Test/Files/test', format = "arrow")
}) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)