[ 
https://issues.apache.org/jira/browse/ARROW-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022378#comment-17022378
 ] 

Neal Richardson commented on ARROW-7662:
----------------------------------------

Yes, there is a difference. Think of StructType as a data frame, so each row of 
the table has a row of the StructType in it. 

{code}
> df <- data.frame(col1=1:3, col2=c("a", "b", "c"))
> a <- Array$create(df)
> a$type
StructType
struct<col1: int32, col2: dictionary<values=string, indices=int8, ordered=0>>
> length(a)
[1] 3
{code}

Each column in the data frame has a type, so each field of the StructType is a 
known type. 

I think the tricky part is that for a data frame column, it is an R "list" 
underneath because that's what a data frame is, but the list elements are the 
columns of the struct, and the rows are the rows within each column of the data 
frame. But for list column, the list elements are the rows. So when we create 
{{list_of()}}, we need to know what type to provide there.

Just brainstorming, but I think you want something like (forgive the probably 
invalid code)

{code}
case VECSXP:
  if (Rf_inherits(x, "data.frame")) {
    (... the existing code)
  } else { // perhaps you want to check that the list is not named?
    R_xlen_t n = XLENGTH(x);
    if (n > 0) {
      std::shared_ptr<arrow::Type> element_type = InferType(VECTOR_ELT(x, 0);
      for (R_xlen_t i = 1; i < n; i++) {
        if (element_type != InferType(VECTOR_ELT(x, i)) {
          break;
        }
      }
      if (i == n) { // we made it through the loop and all are the same
        return std::make_shared<ListType>(element_type);
      }
  }
{code}

That said, I'm not sure how great the ListType support is currently in the R 
package. We might be able to infer the type but still have more left to 
implement. 

{code}
> lcol <- list(1, 2, 3)
> Array$create(lcol)$type
Error in Array__from_vector(x, type) : cannot infer type from data
## That was expected, given that we haven't fixed that switch statement yet
## but what if we specify the type?
> Array$create(lcol, type = list_of(float64()))
Error in Array__from_vector(x, type) : 
  NotImplemented: type not implemented
{code}


> [R] Support auto-inferring list column type
> -------------------------------------------
>
>                 Key: ARROW-7662
>                 URL: https://issues.apache.org/jira/browse/ARROW-7662
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Michael Chirico
>            Priority: Major
>
> {code:r}
> DF = data.frame(a = 1:10)
> DF$b = as.list(DF$a)
> arrow::write_parquet(DF, 'test.parquet')
> # Error in Table__from_dots(dots, schema) : cannot infer type from data
> {code}
> This appears to be supported naturally already in Python:
> {code:python}
> import pandas as pd
> pd.DataFrame({'a': [1, 2, 3], 'b': [[1, 2], [3, 4], [5, 
> 6]]}).to_parquet('test.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to