[jira] [Updated] (ARROW-5718) [R] Add as_record_batch()

Neal Richardson (JIRA) Mon, 24 Jun 2019 19:40:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neal Richardson updated ARROW-5718:
-----------------------------------
    Description: 
ARROW-3814 / 
[https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
 changed the API of `record_batch()` and `arrow::table()` such that you could 
no longer pass in a data.frame to the function, not without [massaging it 
yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
 That broke sparklyr integration tests with an opaque `cannot infer type from 
data` error, and it's unfortunate that there's no longer a direct way to go 
from a data.frame to a record batch, which sounds like a common need.

In order to follow best practices (cf. the 
[tibble|https://tibble.tidyverse.org/] package, for example), we should (1) add 
an {{as_record_batch}} function, which the data.frame method is probably just 
{{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a 
user supplies a single, unnamed data.frame as the argument to 
{{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
may later decide that we should automatically call as_record_batch(), but in 
case that is too magical and prevents some legitimate use case, let's hold off 
for now. It's easier to add magic than remove it.

Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
and if that function doesn't exist, fall back to {{record_batch}} (because that 
means it has an older released version of arrow that doesn't have 
as_record_batch, so record_batch(df) should work).

cc [~javierluraschi]

  was:
ARROW-3814 / 
[https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
 changed the API of `record_batch()` and `arrow::table()` such that you could 
no longer pass in a data.frame to the function, not without [massaging it 
yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
 That broke sparklyr integration tests with an opaque `cannot infer type from 
data` error, and it's unfortunate that there's no longer a direct way to go 
from a data.frame to a record batch, which sounds like a common need.

After some discussion, we resolved that a solution would be to (1) add an 
{{as_record_batch}} function, which the data.frame method is probably just 
{{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a 
user supplies a single, unnamed data.frame as the argument to 
{{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
may later decide that we should automatically call as_record_batch(), but in 
case that is too magical and prevents some legitimate use case, let's hold off 
for now. It's easier to add magic than remove it.

Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
and if that function doesn't exist, fall back to {{record_batch}} (because that 
means it has an older released version of arrow that doesn't have 
as_record_batch, so record_batch(df) should work).

cc [~javierluraschi]


> [R] Add as_record_batch()
> -------------------------
>
>                 Key: ARROW-5718
>                 URL: https://issues.apache.org/jira/browse/ARROW-5718
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Minor
>             Fix For: 0.14.0
>
>
> ARROW-3814 / 
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
>  changed the API of `record_batch()` and `arrow::table()` such that you could 
> no longer pass in a data.frame to the function, not without [massaging it 
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
>  That broke sparklyr integration tests with an opaque `cannot infer type from 
> data` error, and it's unfortunate that there's no longer a direct way to go 
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the 
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1) 
> add an {{as_record_batch}} function, which the data.frame method is probably 
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and 
> (2) if a user supplies a single, unnamed data.frame as the argument to 
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We 
> may later decide that we should automatically call as_record_batch(), but in 
> case that is too magical and prevents some legitimate use case, let's hold 
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}}, 
> and if that function doesn't exist, fall back to {{record_batch}} (because 
> that means it has an older released version of arrow that doesn't have 
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-5718) [R] Add as_record_batch()

Reply via email to