Sorry for the delay in response. I would suggest that you open a PR (or point to a branch with those changes), that will make it easier to discuss specific implementation options (rather than trying to explain and understand it in words) and give advice.
On Wed, 6 Nov 2019 at 20:29, Justin Polchlopek <jpolchlo...@azavea.com> wrote: > Hi. I'm looking into this issue and I have some questions as someone new > to the project. The comment from Joris earlier in the thread suggests that > the solution here is to create an Array subclass for each extension type > that wants to use one. This will give a nice symmetry w.r.t. the Java > interface, but in the Python case, this seems to suggest having to travel > some fairly byzantine code paths (rather quickly, we end up in C++ code, > where I lose the thread of what's happening—specifically as regards > `pyarrow_wrap_array`, as suggested in ARROW-6176). > The goal here is that for the end user, it is possible to do this without involving C++ code, and I *think* implementing it should be possible from cython. How did you end up in C++? > I came up with a quick-and-dirty method wherein the ExtensionType subclass > simply provides a method to translate from the storage type to the output > type, and ExtensionArray has a __getitem__ implementation that passes the > element from storage through the translation function. This doesn't feel > outside of the realm of what is often acceptable in the python world, but > it isn't nearly as typeful as Arrow seems to be leaning. Plus, this feels > very far from what was intended in the issue, and I believe that I'm not > understanding the underlying design principles. > > Can I get a bit of advice on this? > > Thanks. > -J > > On Tue, Oct 29, 2019 at 12:26 PM Justin Polchlopek <jpolchlo...@azavea.com > > > wrote: > > > That sounds about right. We're doing some work here that might require > > this feature sooner than later, and if we decide to go the route that > needs > > this improved support, I'd be happy to make this PR. Thanks for showing > > that issue. I'll be sure to tag any contribution with that ticket > number. > > > > On Tue, Oct 29, 2019 at 9:01 AM Joris Van den Bossche < > > jorisvandenboss...@gmail.com> wrote: > > > >> > >> On Mon, 28 Oct 2019 at 22:41, Wes McKinney <wesmck...@gmail.com> wrote: > >> > >>> Adding dev@ > >>> > >>> I don't believe we have APIs yet for plugging in user-defined Array > >>> subtypes. I assume you've read > >>> > >>> > >>> > http://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types > >>> > >>> There may be some JIRA issues already about this (defining subclasses > >>> of pa.Array with custom behavior) -- since Joris has been working on > >>> this I'm interested in more comments > >>> > >> > >> Yes, there is https://issues.apache.org/jira/browse/ARROW-6176 for > >> exactly this issue. > >> What I proposed there is to allow one to subclass pyarrow.ExtensionArray > >> and to attach this to an attribute on the custom ExtensionType (eg > >> __arrow_ext_array_class__ in line with the other __arrow_ext_.. > >> methods). That should allow to achieve similar functionality as what is > >> available in Java I think. > >> > >> If that seems a good way to do this, I think we certainly welcome a PR > >> for that (I can also look into it otherwise before 1.0). > >> > >> Joris > >> > >> > >>> > >>> On Mon, Oct 28, 2019 at 3:56 PM Justin Polchlopek > >>> <jpolchlo...@azavea.com> wrote: > >>> > > >>> > Hi! > >>> > > >>> > I've been working through understanding extension types in Arrow. > >>> It's a great feature, and I've had no problems getting things working > in > >>> Java/Scala; however, Python has been a bit of a different story. Not > that > >>> I am unable to create and register extension types in Python, but > rather > >>> that I can't seem to recreate the functionality provided by the Java > API's > >>> ExtensionTypeVector class. > >>> > > >>> > In Java, ExtensionType::getNewVector() provides a clear pathway from > >>> the registered type to output a vector in something other than the > >>> underlying vector type, and I am at a loss for how to get this same > >>> functionality in Python. Am I missing something? > >>> > > >>> > Thanks for any hints. > >>> > -Justin > >>> > >> >