[ 
https://issues.apache.org/jira/browse/ARROW-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ARF updated ARROW-6486:
-----------------------
    Description: 
Currently, many classes in ``pyarrow`` behave strangely to the Python user: 
they are neither subclassable not monkey-patchable.

 

{{>>> import pyarrow as pa}}
 {{>>> class MyTable(pa.Table):}}
 {{... pass}}
 {{...}}
 {{>>> table = MyTable.from_arrays([], [])}}
 {{>>> type(table)}}
 {{<class 'pyarrow.lib.Table'>}}

The factory method did not return an instance of our subclass...

Never mind, let's monkey-patch {{Table}}:

 

{{>>> pa.TableOriginal = pa.Table}}
 {{>>> pa.Table = MyTable}}
 {{>>> table = pa.Table.from_arrays([], [])}}
 {{>>> type(table)}}
 {{<class 'pyarrow.lib.Table'>}}
 \{{}}

 

OK, that did not work either.

Let's be sneaky:

{{>>> table.__class__ = MyTable}}
 {{Traceback (most recent call last):}}
 \{{ File "<stdin>", line 1, in <module>}}
 {{TypeError: __class__ assignment only supported for heap types or ModuleType 
subclasses}}
 {{>>>}}

 

There is currently no way to modify or extend the behaviour of a {{Table}} 
instance. Users can use only what {{pyarrow}} provides out of the box. - This 
is likely to be a source of frustration for many python users.

 

The attached PR remedies this for the {{Table}} class:

{{>>> import pyarrow as pa}}
 {{>>> class MyTable(pa.Table):}}
 {{... pass}}
 {{...}}
 {{>>> table = MyTable.from_arrays([], [])}}
 {{>>> type(table)}}
 {{<class '__main__.MyTable'>}}
 {{>>>}}
 {{>>> pa.TableOriginal = pa.Table}}
 {{>>> pa.Table = MyTable}}
 {{>>> table = pa.Table.from_arrays([], [])}}
 {{>>> type(table)}}
 {{<class '__main__.MyTable'>}}
 {{>>>}}

 

Ideally, these modifications would be extended to the other cython-defined 
classes of {{pyarrow}}, but given that Table is likely to be the interface that 
most users begin their interaction with, I thought this would be a good start.

Keeping the changes limited to a single class should also keep merge conflicts 
manageable.

  was:
Currently, many classes in ``pyarrow`` behave strangely to the Python user: 
they are neither subclassable not monkey-patchable.

 

{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}

The factory method did not return an instance of our subclass...

Never mind, let's monkey-patch {{Table}}:

{{}}

{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
{{}}

 

OK, that did not work either.

Let's be sneaky:

{{>>> table.__class__ = MyTable}}
{{Traceback (most recent call last):}}
{{ File "<stdin>", line 1, in <module>}}
{{TypeError: __class__ assignment only supported for heap types or ModuleType 
subclasses}}
{{>>>}}

 

There is currently no way to modify or extend the behaviour of a {{Table}} 
instance. Users can use only what {{pyarrow}} provides out of the box. - This 
is likely to be a source of frustration for many python users.

 

The attached PR remedies this for the {{Table}} class:

{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}

 

Ideally, these modifications would be extended to the other cython-defined 
classes of {{pyarrow}}, but given that Table is likely to be the interface that 
most users begin their interaction with, I thought this would be a good start.

Keeping the changes limited to a single class should also keep merge conflicts 
manageable.


> [Python] Allow subclassing & monkey-patching of Table
> -----------------------------------------------------
>
>                 Key: ARROW-6486
>                 URL: https://issues.apache.org/jira/browse/ARROW-6486
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: ARF
>            Priority: Major
>
> Currently, many classes in ``pyarrow`` behave strangely to the Python user: 
> they are neither subclassable not monkey-patchable.
>  
> {{>>> import pyarrow as pa}}
>  {{>>> class MyTable(pa.Table):}}
>  {{... pass}}
>  {{...}}
>  {{>>> table = MyTable.from_arrays([], [])}}
>  {{>>> type(table)}}
>  {{<class 'pyarrow.lib.Table'>}}
> The factory method did not return an instance of our subclass...
> Never mind, let's monkey-patch {{Table}}:
>  
> {{>>> pa.TableOriginal = pa.Table}}
>  {{>>> pa.Table = MyTable}}
>  {{>>> table = pa.Table.from_arrays([], [])}}
>  {{>>> type(table)}}
>  {{<class 'pyarrow.lib.Table'>}}
>  \{{}}
>  
> OK, that did not work either.
> Let's be sneaky:
> {{>>> table.__class__ = MyTable}}
>  {{Traceback (most recent call last):}}
>  \{{ File "<stdin>", line 1, in <module>}}
>  {{TypeError: __class__ assignment only supported for heap types or 
> ModuleType subclasses}}
>  {{>>>}}
>  
> There is currently no way to modify or extend the behaviour of a {{Table}} 
> instance. Users can use only what {{pyarrow}} provides out of the box. - This 
> is likely to be a source of frustration for many python users.
>  
> The attached PR remedies this for the {{Table}} class:
> {{>>> import pyarrow as pa}}
>  {{>>> class MyTable(pa.Table):}}
>  {{... pass}}
>  {{...}}
>  {{>>> table = MyTable.from_arrays([], [])}}
>  {{>>> type(table)}}
>  {{<class '__main__.MyTable'>}}
>  {{>>>}}
>  {{>>> pa.TableOriginal = pa.Table}}
>  {{>>> pa.Table = MyTable}}
>  {{>>> table = pa.Table.from_arrays([], [])}}
>  {{>>> type(table)}}
>  {{<class '__main__.MyTable'>}}
>  {{>>>}}
>  
> Ideally, these modifications would be extended to the other cython-defined 
> classes of {{pyarrow}}, but given that Table is likely to be the interface 
> that most users begin their interaction with, I thought this would be a good 
> start.
> Keeping the changes limited to a single class should also keep merge 
> conflicts manageable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to