Hi Andy, Usually I'd say that development outside of ASF should be faster because you can publish a new release even after each commit. In ASF you need to do a VOTE and wait for 3 binding +1s and 72 hours. A user of datafusion-substrait could use git dependency to use latest version even without published crate. But I see that datafusion-substrait currently depends on datafusion 13.0 and I guess this is the main reason for moving it to arrow-datafusion. Another solution would be datafusion-substrait to depend on arrow-datafusion master via a git dependency.
+1 to move it as a subproject to arrow-datafusion now! This will avoid collecting [more] (I)CLAs later and I see that there are plans replace datafusion-proto with it at some point. Martin On Wed, Dec 7, 2022 at 2:50 AM Andy Grove <andygrov...@gmail.com> wrote: > Hi, > > The DataFusion community has built an integration between DataFusion and > Substrait under the datafusion-contrib GitHub organization [1]. > > The project is now receiving regular contributions from NVIDIA (who are > using it internally for a research project), and now GreptimeDB have > expressed an interest in contributing as well [2]. > > I think that we should consider moving development of this crate into > DataFusion and wanted to see what others think of this idea. > > One reason for moving it into the official project is so that we can access > this functionality from the DataFusion Python bindings without adding a > dependency on a project that is outside ASF governance. > > I have created a GtiHub issue [3] where we can discuss this more, or we can > discuss it here on the mailing list. > > I look forward to hearing some opinions on this. > > Thanks, > > Andy. > > [1] https://github.com/datafusion-contrib/datafusion-substrait > [2] https://github.com/datafusion-contrib/datafusion-substrait/pull/34 > [3] https://github.com/apache/arrow-datafusion/issues/4536 >