Joris Van den Bossche created ARROW-11553:
---------------------------------------------

             Summary: [Python] Make Table.cast(schema) more flexible regarding 
order of fields / missing fields?
                 Key: ARROW-11553
                 URL: https://issues.apache.org/jira/browse/ARROW-11553
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche


Currently, {{Table.cast}} requires a new schema with exactly the same names and 
same order of those names (it simply does a {{self.schema.names != 
target_schema.names: raise ...}} check). Example:

{code: python}
In [5]: table = pa.table({'a': [1, 2, 3], 'b': [.1, .2, .3]})

In [7]: table
Out[7]: 
pyarrow.Table
a: int64
b: double

In [9]: schema = pa.schema([('a', pa.int32()), ('b', pa.float32())])

In [10]: table.cast(schema)
Out[10]: 
pyarrow.Table
a: int32
b: float

In [11]: schema2 = pa.schema([('b', pa.float32()), ('a', pa.int32())])

In [12]: table.cast(schema2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0c712db0c16a> in <module>
----> 1 table.cast(schema2)

~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table.cast()

ValueError: Target schema's field names are not matching the table's field 
names: ['a', 'b'], ['b', 'a']
{code}

Do we want to make this more flexible? Allow different order? (and the follow 
order of the passed schema or of the original table?) Allow missing fields? 
(and then use the fields of the schema to "subset" as well?)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to