So, checking my understanding, let's imagine a hypothetical scenario. * There is a data scientist that is well versed in pandas * There is a project team working in kotlin * The project team wants to use the data scientists' code in their project.
# Transpilation The transpilation approach would be to transpile the python code to kotlin. The pyarrow functions themselves would be quite difficult to transpile. They are cython links to C++ shared library exports and thus pretty sensitive to in-memory representation of the non-arrow data types in addition to the arrow data types. However, it sounds like you're stating it would be possible to migrate the script but transpile pyarrow calls to appropriate calls against the Java implementation of Arrow, which would probably be easier. # Data sharing > But I'm not sure I understand the point about shared computing libraries or > how you propose to make the situation better. I believe the shared data approach would be to zero copy marshal the data from kotlin to python, call the pandas code, then zero copy marshal the result back to kotlin. -- It seems to me that both approaches would be possible and each have pros & cons. Did I capture the understanding correctly? On Tue, May 18, 2021 at 3:29 PM Arun Sharma <a...@sharma-home.net> wrote: > > On Tue, May 18, 2021 at 5:37 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > You just sent this same e-mail 24 hours ago. I think the problems we > > are solving are different. We are addressing language siloing at the > > data level and the shared-computing-libraries level. I am not sure > > that code transpilers help us very much. > > > > Oops - sorry for the dup. I checked the archives and didn't see it there. > But it was on the second page that I somehow missed. > > Yes - data silos are a different problem not addressed by code transpilers. > But I'm not sure I understand the point about shared computing libraries or > how you propose to make the situation better. > > Say we're talking arrow + datafusion (which is written in Rust). It > sounded like your goal is to ensure that users of different language > ecosystems get the same performance and feature set as rust. Let me know if > I misunderstood. > > Mapping code is one problem: a ^ b in python is transpiled to a.xor(b) in > Kotlin for example. But mapping APIs is a different problem. > json.loads(input) could transpile to a different library API in the target > language. I was thinking you'd be more interested in the latter. There is a > plugin system I'm designing which could benefit from knowing about real > world use cases. > > -Arun