braindevices opened a new issue, #38809:
URL: https://github.com/apache/arrow/issues/38809

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ### Describe the enhancement requested
   
   in R there is `concat_tables(..., unify_schemas = 
TRUE)`(https://arrow.apache.org/docs/r/reference/concat_tables.html)
   In python this function does not allow concat table with different schema.
   
   
   > unify_schemas
   >
   >    If TRUE, the schemas of the tables will be first unified with fields of 
the same name being merged, then each table will be promoted to the unified 
schema before being concatenated. Otherwise, all tables should have the same 
schema.
   
   I tried the promote_options="permissive" which does not seem to do the trick.
   
   ```
   t1 = pa.Table.from_pydict({'a': [1., 2], 'b': [{'k1': 0, 'k2':1}, {'k3': 
'ok'}]})
   t2 = pa.Table.from_pydict({'a': [1., None], 'b': [{'k1': None}, {'k3': 
None}]})
   
   pa.concat_tables([t1, t2], unify_schemas=True, promote_options="permissive")
   ```
   
   It will have `ArrowTypeError: struct fields don't match or are in the wrong 
order: Input fields: struct<k1: null, k3: null> output fields: struct<k1: 
int64, k2: int64, k3: string>`
   
   I expect this will be just ok
   
   ```
       cdef cppclass CConcatenateTablesOptions" 
arrow::ConcatenateTablesOptions":
           c_bool unify_schemas
           CField.CMergeOptions field_merge_options
   ```
   
   We can see the bool field is already defined in there, but there is no way 
to set value in the actual function
   
https://github.com/apache/arrow/blob/f98a13250d10dba248a2bb85989d6b80265e82d8/python/pyarrow/table.pxi#L5169
   
   in the code you only set the field_merge_options which is option of 
unify_schema().
   I think you either should initialize the unify_schemas to true, oh you need 
to actually handle the keyword unify_schemas. Otherwise, the field merge 
options actually not in effect at all.
   
   
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to