On Tue, May 18, 2021 at 8:34 PM Weston Pace <weston.p...@gmail.com> wrote:

>
> The pyarrow functions themselves would be quite difficult to
> transpile.  They are cython links to C++ shared library exports and
> thus pretty sensitive to in-memory representation of the non-arrow
> data types in addition to the arrow data types.
>

Right - that is the status quo. Let's say we go back to when pyarrow was
first implemented. Why not implement it entirely in python (given that Wes
and many folks find it much easier to get stuff done in python)? The reason
I'm guessing is - it's not performant and doesn't provide the same level of
control over memory allocation and layout etc.

So implementing it in C++ and wrapping it in cython was standard practice
to get better performance. Doing so requires you to learn custom cython
syntax as opposed to standard python3 with ctypes. Debugging any problems
with the bindings requires you to deal with non-idiomatic C/C++ code, which
is using python's C API.

The transpiler approach is:

* Write all of it in python. Get it functional and do so quickly.
* Transpile into idiomatic C++/Rust/Julia/$favorite_language
* Compile, debug and run the entire thing in one language domain, without
having to cross abstraction boundaries.

I'm also investigating a hybrid approach, where some python code can't be
transpiled (possibly because it's using dynamic features of python). There
is a way to transpile with "--extension" flag which causes python code to
get transpiled to rust as a pyO3 extension. This is very similar to what
cython does, with two differences:

* Use vanilla python instead of pyx
* Use rust instead of C


>
> However, it sounds like you're stating it would be possible to migrate
> the script but transpile pyarrow calls to appropriate calls against
> the Java implementation of Arrow, which would probably be easier.
>
>
That would be an API level translation. Like translating:

```python
a = [1, 2, 3]
a.append(4)
```

to

```julia
a = [1, 2, 3]
push!(a, 4)
```

This hasn't been the focus of the project so far. Just getting started on
API translation (although this example works out of the box).



>
> # Data sharing
>
> > But I'm not sure I understand the point about shared computing libraries
> or how you propose to make the situation better.
>
> I believe the shared data approach would be to zero copy marshal the
> data from kotlin to python, call the pandas code, then zero copy
> marshal the result back to kotlin.
>
>
Yes, this is valuable and doesn't need transpilers.

 -Arun



> --
>
> It seems to me that both approaches would be possible and each have
> pros & cons.  Did I capture the understanding correctly?
>
> On Tue, May 18, 2021 at 3:29 PM Arun Sharma <a...@sharma-home.net> wrote:
> >
> > On Tue, May 18, 2021 at 5:37 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > You just sent this same e-mail 24 hours ago. I think the problems we
> > > are solving are different. We are addressing language siloing at the
> > > data level and the shared-computing-libraries level. I am not sure
> > > that code transpilers help us very much.
> > >
> >
> > Oops - sorry for the dup. I checked the archives and didn't see it there.
> > But it was on the second page that I somehow missed.
> >
> > Yes - data silos are a different problem not addressed by code
> transpilers.
> > But I'm not sure I understand the point about shared computing libraries
> or
> > how you propose to make the situation better.
> >
> > Say we're talking arrow + datafusion (which is written in Rust).  It
> > sounded like your goal is to ensure that users of different language
> > ecosystems get the same performance and feature set as rust. Let me know
> if
> > I misunderstood.
> >
> > Mapping code is one problem: a ^ b in python is transpiled to a.xor(b) in
> > Kotlin for example. But mapping APIs is a different problem.
> > json.loads(input) could transpile to a different library API in the
> target
> > language. I was thinking you'd be more interested in the latter. There
> is a
> > plugin system I'm designing which could benefit from knowing about real
> > world use cases.
> >
> >  -Arun
>

Reply via email to