Dominic Dennenmoser created ARROW-8748:
------------------------------------------

             Summary: Implementing methodes for combining arrow tabels using 
dplyr::bind_rows and dplyr::bind_cols
                 Key: ARROW-8748
                 URL: https://issues.apache.org/jira/browse/ARROW-8748
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
            Reporter: Dominic Dennenmoser


First at all, many thanks for your hard work! I was quite exited, when you guys 
implemented some basic function of the the {{dplyr}} package. Is there a why to 
combine tow or more arrow tables into one by rows or columns? At the moment my 
workaround looks like this:
{code:r}
dplyr::bind_rows(
   "a" = arrow.table.1 %>% dplyr::collect(),
   "b" = arrow.table.2 %>% dplyr::collect(),
   "c" = arrow.table.3 %>% dplyr::collect(),
   "d" = arrow.table.4 %>% dplyr::collect(),
   .id = "ID"
 ) %>% 
 arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow")
{code}
But this is actually not really a meaningful measure because of putting the 
data back as dataframes/tibbles into the r environment, which might lead to an 
exhaust of RAM space. Perhaps you might have a better workaround on hand. It 
would be great if you guys could implement the {{bind_rows}} and {{bind_cols}} 
methods provided by {{dplyr}}.
{code:java}
dplyr::bind_rows(
   "a" = arrow.table.1,
   "b" = arrow.table.2,
   "c" = arrow.table.3,
   "d" = arrow.table.4, 
   .id = "ID"
) %>% 
 arrow::write_ipc_stream(sink = "file_name_combined_tables.arrow"){code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to