Sorry for the delay in response. I would suggest that you open a PR (or
point to a branch with those changes), that will make it easier to discuss
specific implementation options (rather than trying to explain and
understand it in words) and give advice.

On Wed, 6 Nov 2019 at 20:29, Justin Polchlopek <jpolchlo...@azavea.com>
wrote:

> Hi.  I'm looking into this issue and I have some questions as someone new
> to the project.  The comment from Joris earlier in the thread suggests that
> the solution here is to create an Array subclass for each extension type
> that wants to use one.  This will give a nice symmetry w.r.t. the Java
> interface, but in the Python case, this seems to suggest having to travel
> some fairly byzantine code paths (rather quickly, we end up in C++ code,
> where I lose the thread of what's happening—specifically as regards
> `pyarrow_wrap_array`, as suggested in ARROW-6176).
>

The goal here is that for the end user, it is possible to do this without
involving C++ code, and I *think* implementing it should be possible from
cython. How did you end up in C++?


> I came up with a quick-and-dirty method wherein the ExtensionType subclass
> simply provides a method to translate from the storage type to the output
> type, and ExtensionArray has a __getitem__ implementation that passes the
> element from storage through the translation function.  This doesn't feel
> outside of the realm of what is often acceptable in the python world, but
> it isn't nearly as typeful as Arrow seems to be leaning.  Plus, this feels
> very far from what was intended in the issue, and I believe that I'm not
> understanding the underlying design principles.
>
> Can I get a bit of advice on this?
>
> Thanks.
> -J
>
> On Tue, Oct 29, 2019 at 12:26 PM Justin Polchlopek <jpolchlo...@azavea.com
> >
> wrote:
>
> > That sounds about right.  We're doing some work here that might require
> > this feature sooner than later, and if we decide to go the route that
> needs
> > this improved support, I'd be happy to make this PR.  Thanks for showing
> > that issue.  I'll be sure to tag any contribution with that ticket
> number.
> >
> > On Tue, Oct 29, 2019 at 9:01 AM Joris Van den Bossche <
> > jorisvandenboss...@gmail.com> wrote:
> >
> >>
> >> On Mon, 28 Oct 2019 at 22:41, Wes McKinney <wesmck...@gmail.com> wrote:
> >>
> >>> Adding dev@
> >>>
> >>> I don't believe we have APIs yet for plugging in user-defined Array
> >>> subtypes. I assume you've read
> >>>
> >>>
> >>>
> http://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types
> >>>
> >>> There may be some JIRA issues already about this (defining subclasses
> >>> of pa.Array with custom behavior) -- since Joris has been working on
> >>> this I'm interested in more comments
> >>>
> >>
> >> Yes, there is https://issues.apache.org/jira/browse/ARROW-6176 for
> >> exactly this issue.
> >> What I proposed there is to allow one to subclass pyarrow.ExtensionArray
> >> and to attach this to an attribute on the custom ExtensionType (eg
> >> __arrow_ext_array_class__ in line with the other __arrow_ext_..
> >> methods). That should allow to achieve similar functionality as what is
> >> available in Java I think.
> >>
> >> If that seems a good way to do this, I think we certainly welcome a PR
> >> for that (I can also look into it otherwise before 1.0).
> >>
> >> Joris
> >>
> >>
> >>>
> >>> On Mon, Oct 28, 2019 at 3:56 PM Justin Polchlopek
> >>> <jpolchlo...@azavea.com> wrote:
> >>> >
> >>> > Hi!
> >>> >
> >>> > I've been working through understanding extension types in Arrow.
> >>> It's a great feature, and I've had no problems getting things working
> in
> >>> Java/Scala; however, Python has been a bit of a different story.  Not
> that
> >>> I am unable to create and register extension types in Python, but
> rather
> >>> that I can't seem to recreate the functionality provided by the Java
> API's
> >>> ExtensionTypeVector class.
> >>> >
> >>> > In Java, ExtensionType::getNewVector() provides a clear pathway from
> >>> the registered type to output a vector in something other than the
> >>> underlying vector type, and I am at a loss for how to get this same
> >>> functionality in Python.  Am I missing something?
> >>> >
> >>> > Thanks for any hints.
> >>> > -Justin
> >>>
> >>
>

Reply via email to