[
https://issues.apache.org/jira/browse/ARROW-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson updated ARROW-5718:
-----------------------------------
Description:
ARROW-3814 /
[https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
changed the API of `record_batch()` and `arrow::table()` such that you could
no longer pass in a data.frame to the function, not without [massaging it
yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
That broke sparklyr integration tests with an opaque `cannot infer type from
data` error, and it's unfortunate that there's no longer a direct way to go
from a data.frame to a record batch, which sounds like a common need.
In order to follow best practices (cf. the
[tibble|https://tibble.tidyverse.org/] package, for example), we should (1) add
an {{as_record_batch}} function, which the data.frame method is probably just
{{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a
user supplies a single, unnamed data.frame as the argument to
{{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We
may later decide that we should automatically call as_record_batch(), but in
case that is too magical and prevents some legitimate use case, let's hold off
for now. It's easier to add magic than remove it.
Once this function exists, sparklyr tests can try to use {{as_record_batch}},
and if that function doesn't exist, fall back to {{record_batch}} (because that
means it has an older released version of arrow that doesn't have
as_record_batch, so record_batch(df) should work).
cc [~javierluraschi]
was:
ARROW-3814 /
[https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
changed the API of `record_batch()` and `arrow::table()` such that you could
no longer pass in a data.frame to the function, not without [massaging it
yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
That broke sparklyr integration tests with an opaque `cannot infer type from
data` error, and it's unfortunate that there's no longer a direct way to go
from a data.frame to a record batch, which sounds like a common need.
After some discussion, we resolved that a solution would be to (1) add an
{{as_record_batch}} function, which the data.frame method is probably just
{{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and (2) if a
user supplies a single, unnamed data.frame as the argument to
{{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We
may later decide that we should automatically call as_record_batch(), but in
case that is too magical and prevents some legitimate use case, let's hold off
for now. It's easier to add magic than remove it.
Once this function exists, sparklyr tests can try to use {{as_record_batch}},
and if that function doesn't exist, fall back to {{record_batch}} (because that
means it has an older released version of arrow that doesn't have
as_record_batch, so record_batch(df) should work).
cc [~javierluraschi]
> [R] Add as_record_batch()
> -------------------------
>
> Key: ARROW-5718
> URL: https://issues.apache.org/jira/browse/ARROW-5718
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Priority: Minor
> Fix For: 0.14.0
>
>
> ARROW-3814 /
> [https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94]
> changed the API of `record_batch()` and `arrow::table()` such that you could
> no longer pass in a data.frame to the function, not without [massaging it
> yourself|https://github.com/apache/arrow/pull/3565/files#diff-09c05d1a6ff41bed094fbccfa76395a6R27].
> That broke sparklyr integration tests with an opaque `cannot infer type from
> data` error, and it's unfortunate that there's no longer a direct way to go
> from a data.frame to a record batch, which sounds like a common need.
> In order to follow best practices (cf. the
> [tibble|https://tibble.tidyverse.org/] package, for example), we should (1)
> add an {{as_record_batch}} function, which the data.frame method is probably
> just {{as_record_batch.data.frame <- function(x) record_batch(!!!x)}}; and
> (2) if a user supplies a single, unnamed data.frame as the argument to
> {{record_batch()}}, raise an error that says to use {{as_record_batch()}}. We
> may later decide that we should automatically call as_record_batch(), but in
> case that is too magical and prevents some legitimate use case, let's hold
> off for now. It's easier to add magic than remove it.
> Once this function exists, sparklyr tests can try to use {{as_record_batch}},
> and if that function doesn't exist, fall back to {{record_batch}} (because
> that means it has an older released version of arrow that doesn't have
> as_record_batch, so record_batch(df) should work).
> cc [~javierluraschi]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)