Warming up this thread :) I thought a bit about this and wonder: Would a shared type system of all the ML projects in the ASF be a good place to start?
Many data science projects I observe seem to be using several of our tools, mixed with other open source and proprietary software. This introduces all sorts of inefficiencies and breakages, many of which have their root cause in the different type systems used (e.g., "wait, -1 means missing value her? I thought we used 0??"). If we could agree on a type system here in the ASF, it could provide a north star for the community at large. Maybe we can even collaborate with Apache Avro to make sure that the whole type system has a defined serialized form. WDYT? How much of my observations did you have? Is a shared type system feasible and interesting? If so, we can start a cross-project thread on it. Thanks, Markus
