Seems like we are all agree about the idea to add a Java API. Maybe it is just me, but it wouldn't at all make sense for me (OpenNLP use case) to use the Java API when it requires a Scala dependency, because at that point I would be better of just using the Scala API, and ensure that the things I build are compatible with Java.
So if I don't want to add Scala as a dependency then I am better off building something on top of a generated JNI layer. As far as I can tell from my tests with the scala-package you can get quite far with MXNet using NDArray and the Symbol API. Maybe we could work on this from two sides as described by Pracheer. If we have a well defined Java API you could look at the work I have done by then and see how it can be plugged in or what can be learnt from it. Jörn On Wed, Aug 16, 2017 at 9:05 PM, Nan Zhu <[email protected]> wrote: > +1 for Sandeep's suggestion > > On Wed, Aug 16, 2017 at 11:21 AM, YiZhi Liu <[email protected]> wrote: > >> Agree with Sandeep, while I guess the performance won't change. But >> yes, benchmark talks. >> >> Moreover, in Scala package we use macros to generate operators >> automatically, which will require more efforts if we switch to pure >> Java. >> >> 2017-08-17 2:12 GMT+08:00 sandeep krishnamurthy < >> [email protected]>: >> > The fastest way to get Java binding is through building Java native >> > wrappers on Scala package. >> > Disadvantages would be: >> > * *Bloated library size: *May not be suitable for users planning to >> use >> > Java APIs in Android of such smaller systems. >> > * *Performance:* Performance may not be as good as building directly >> > over JNI and implementing ground up. For example, taking NDArray >> dimensions >> > as Java ArrayList then converting it to Scala Seq to adapt for Scala >> > NDArray API and more such adapters. >> > >> > However, building ground up from JNI would be a huge effort without >> > actually getting feedback from users early. >> > >> > *My Plan:* >> > 1. Build Java interface on top of Scala package. >> > 2. Get early feedback from users. It may turn out Java is not a great >> > candidate for DL training jobs. >> > 3. Solidify the interface (APIs) for Java users. >> > 4. Do performance benchmarks to see Scala Native / Java interface. This >> > gives us comparable numbers on performance in Java. >> > 5. Over a period of time replace underlying Scala usage with JNI base and >> > native Java implementation. Provided feedback from users is positive. >> > >> > Comments/Suggestion? >> > >> > Regards, >> > Sandeep >> > >> > >> > On Wed, Aug 16, 2017 at 10:56 AM, YiZhi Liu <[email protected]> wrote: >> > >> >> What Nan and I worried about is the re-implementation of something >> >> like https://github.com/apache/incubator-mxnet/blob/master/ >> >> scala-package/core/src/main/scala/ml/dmlc/mxnet/Model.scala#L246, >> >> and the executorManager, NDArray, KVStore ... it uses. >> >> >> >> the C API stays at the very low level. If this is the purpose, we can >> >> simply move ml.dmlc.mxnet.LibInfo to 'java' folder and compile without >> >> scala, no need to introduce JavaCPP. But I don't think this is what >> >> users want. >> >> >> >> 2017-08-17 1:41 GMT+08:00 Joern Kottmann <[email protected]>: >> >> > There will be a new scala version one day, and the story we had with >> >> > going from 2.10 to 2.11 might just repeat. In the end if you make a >> >> > dependency using scala you just end up making it for the currently >> >> > popular scala versions. And that might be ok for projects with >> >> > developers who are familiar with these issues, but it is not ok for >> >> > java projects, where people might not expect it or know about these >> >> > problems. It just makes it harder to use. >> >> > >> >> > To me it looks like that the C API is very stable and used by all/most >> >> > other APIs. If we have a Java API - accessing the C API via JavaCPP - >> >> > then we should end up with a pretty stable solution and a lot the code >> >> > that is duplicated with the Scala API is the generated code. >> >> > >> >> > I think we should explore this possible way of implementing it with a >> >> > proof-of-concept. >> >> > >> >> > And if we have a well made Java API it might be something which maybe >> >> > wouldn't need a lot of additions to be pleasurable to use from scala. >> >> > >> >> > Jörn >> >> > >> >> > On Wed, Aug 16, 2017 at 6:45 PM, Nan Zhu <[email protected]> >> wrote: >> >> >> I don't think there will be problems under "11", did the user see >> >> concrete >> >> >> errors? >> >> >> >> >> >> Best, >> >> >> >> >> >> Nan >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Aug 16, 2017 at 9:30 AM, YiZhi Liu <[email protected]> >> wrote: >> >> >> >> >> >>> Hi Nan, >> >> >>> >> >> >>> Users have 2.11, but with a different minor version, will it cause >> >> >>> conflicts? >> >> >>> >> >> >>> 2017-08-17 0:19 GMT+08:00 Nan Zhu <[email protected]>: >> >> >>> > Hi, Yizhi, >> >> >>> > >> >> >>> > You mean users have 2.10 env while we assemble 2.11 in it? >> >> >>> > >> >> >>> > Best, >> >> >>> > >> >> >>> > Nan >> >> >>> > >> >> >>> > On Wed, Aug 16, 2017 at 9:08 AM, YiZhi Liu <[email protected]> >> >> wrote: >> >> >>> > >> >> >>> >> Hi Joern, >> >> >>> >> >> >> >>> >> The point is that, the front is not a simple wrapper of c_api.h, >> as >> >> >>> >> you mentioned, which can be easily achieved by JavaCPP. >> >> >>> >> >> >> >>> >> I have noticed the potential conflicts between the assembled >> scala >> >> >>> >> library and the one in users' environment. Can we remove the >> scala >> >> >>> >> library from the assembly jar? @Nan It wouldn't be a problem >> since >> >> the >> >> >>> >> scala libraries with same major version are compatible. >> >> >>> >> >> >> >>> >> 2017-08-16 23:49 GMT+08:00 Joern Kottmann <[email protected]>: >> >> >>> >> > Hello, >> >> >>> >> > >> >> >>> >> > I personally had quite some issues with Scala dependencies in >> >> >>> >> > different versions and Spark, where one version is not >> compatible >> >> with >> >> >>> >> > the other version. Then you need to debug the dependency tree >> to >> >> find >> >> >>> >> > the places where the versions don't match. Every project which >> >> would >> >> >>> >> > like to use MXnet then has to depend on Scala and might also >> get >> >> >>> >> > conflicts if other dependencies depend on different Scala >> >> versions. >> >> >>> >> > Probably something which will cause issues for some of your >> users. >> >> >>> >> > Users who want to use Java might not be familiar with Scala >> >> dependency >> >> >>> >> > problems and have a hard time resolving them by getting strange >> >> error >> >> >>> >> > messages. >> >> >>> >> > >> >> >>> >> > The JNI layer could be generated with JavaCPP, then we would >> not >> >> need >> >> >>> >> > to write/maintain the C and the jvm side for that our self. >> >> >>> >> > A good example of JavaCPP and Scala usage is Apache Mahout [1]. >> >> >>> >> > >> >> >>> >> > Even if we don't use JavaCPP, the JNI layer should be easy to >> get >> >> into >> >> >>> >> > a state where both can share it, the current Scala JNI layers >> >> LibInfo >> >> >>> >> > classes could be converted to Java classes and would in most >> cases >> >> >>> >> > require only minor changes in the Scala code. >> >> >>> >> > >> >> >>> >> > Jörn >> >> >>> >> > >> >> >>> >> > [1] https://github.com/apache/mahout/tree/master/viennacl/ >> >> src/main >> >> >>> >> > >> >> >>> >> > On Wed, Aug 16, 2017 at 5:30 PM, Nan Zhu < >> [email protected]> >> >> >>> wrote: >> >> >>> >> >> I agree with Yizhi >> >> >>> >> >> >> >> >>> >> >> My major concern is the duplicate implementations, which are >> >> usually >> >> >>> >> one of >> >> >>> >> >> the major sources of bugs, especially with two languages which >> >> are >> >> >>> >> >> naturally interactive (OK, Calling Scala from Java might need >> >> some >> >> >>> more >> >> >>> >> >> efforts). It is just like we provide C++ & C APIs of MxNet in >> two >> >> >>> >> separated >> >> >>> >> >> packages. >> >> >>> >> >> >> >> >>> >> >> About dependency problem, when you say "As far as I see this >> has >> >> the >> >> >>> >> great >> >> >>> >> >> disadvantage that the Java API would force Scala as a >> dependency >> >> onto >> >> >>> >> the >> >> >>> >> >> java users.", would you please give a concrete example causing >> >> >>> critical >> >> >>> >> >> issues? >> >> >>> >> >> >> >> >>> >> >> Best, >> >> >>> >> >> >> >> >>> >> >> Nan >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> On Wed, Aug 16, 2017 at 8:19 AM, YiZhi Liu < >> [email protected]> >> >> >>> wrote: >> >> >>> >> >> >> >> >>> >> >>> Hi, >> >> >>> >> >>> >> >> >>> >> >>> If we build the Java API from the very beginning, i.e. the >> JNI >> >> part, >> >> >>> >> >>> we have to rewrite the codes for training, predict, >> inferShape, >> >> etc. >> >> >>> >> >>> It would be too heavy to maintain a totally new front >> language. >> >> >>> >> >>> >> >> >>> >> >>> As far as I see, I don't think Scala library dependency would >> >> be a >> >> >>> big >> >> >>> >> >>> problem in most cases, unless we are going to use it in >> embedded >> >> >>> >> >>> devices. Could you illustrate some use-cases where you cannot >> >> >>> involve >> >> >>> >> >>> Scala dependencies? >> >> >>> >> >>> >> >> >>> >> >>> 2017-08-16 22:13 GMT+08:00 Joern Kottmann < >> [email protected]>: >> >> >>> >> >>> > Hello, >> >> >>> >> >>> > >> >> >>> >> >>> > the approach which is taken by Spark is described here [1]. >> >> >>> >> >>> > >> >> >>> >> >>> > As far as I see this has the great disadvantage that the >> Java >> >> API >> >> >>> >> >>> > would force Scala as a dependency onto the java users. >> >> >>> >> >>> > For a library it is always a great advantage if it doesn't >> >> have >> >> >>> many >> >> >>> >> >>> > dependencies, or zero dependencies. In our case it could be >> >> quite >> >> >>> >> >>> > realistic to have a thin wrapper around the C API without >> >> needing >> >> >>> any >> >> >>> >> >>> > other dependencies (or only dependencies which can't be >> >> avoided). >> >> >>> >> >>> > >> >> >>> >> >>> > The JNI layer could easily be shared between the Java and >> >> Scala >> >> >>> API. >> >> >>> >> >>> > As far as I understand is the JNI layer in the Scala API >> >> anyway >> >> >>> >> >>> > private and a change to it wouldn't require that the public >> >> part >> >> >>> of >> >> >>> >> >>> > the Scala API is changed. >> >> >>> >> >>> > >> >> >>> >> >>> > What do you think? >> >> >>> >> >>> > >> >> >>> >> >>> > Jörn >> >> >>> >> >>> > >> >> >>> >> >>> > [1] https://cwiki.apache.org/ >> confluence/display/SPARK/Java+ >> >> >>> >> API+Internals >> >> >>> >> >>> > >> >> >>> >> >>> > On Wed, Aug 16, 2017 at 3:39 PM, YiZhi Liu < >> >> [email protected]> >> >> >>> >> wrote: >> >> >>> >> >>> >> Hi Joern, >> >> >>> >> >>> >> >> >> >>> >> >>> >> I suggest to build Java API as a wrapper of Scala API, >> re-use >> >> >>> most >> >> >>> >> of >> >> >>> >> >>> >> the procedures. Referring to the Java API in Apache Spark. >> >> >>> >> >>> >> >> >> >>> >> >>> >> 2017-08-16 18:21 GMT+08:00 Joern Kottmann < >> [email protected] >> >> >: >> >> >>> >> >>> >>> Hello all, >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> I would like to propose the addition of a Java API to >> MXNet. >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> There has been some previous work done for the Scala API, >> >> and it >> >> >>> >> makes >> >> >>> >> >>> >>> sense to at least share the JNI layer between the two. >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> The Java API probably should be aligned with the Python >> API >> >> >>> (and >> >> >>> >> >>> >>> others which exist already) with a few changes to give >> it a >> >> >>> native >> >> >>> >> >>> >>> Java feel. >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> As far as I understand there are multiple people >> interested >> >> to >> >> >>> >> work on >> >> >>> >> >>> >>> this and it would be good to maybe come up with a written >> >> >>> proposal >> >> >>> >> on >> >> >>> >> >>> >>> how things should be. >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> My motivation is to get a Java API which can be used by >> >> Apache >> >> >>> >> OpenNLP >> >> >>> >> >>> >>> to solve various NLP tasks using Deep Learning based >> >> approaches >> >> >>> >> and I >> >> >>> >> >>> >>> am also interested to work on MXNet. >> >> >>> >> >>> >>> >> >> >>> >> >>> >>> Jörn >> >> >>> >> >>> >> >> >> >>> >> >>> >> >> >> >>> >> >>> >> >> >> >>> >> >>> >> -- >> >> >>> >> >>> >> Yizhi Liu >> >> >>> >> >>> >> DMLC member >> >> >>> >> >>> >> Technical Manager >> >> >>> >> >>> >> Qihoo 360 Inc, Shanghai, China >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> >>> -- >> >> >>> >> >>> Yizhi Liu >> >> >>> >> >>> DMLC member >> >> >>> >> >>> Technical Manager >> >> >>> >> >>> Qihoo 360 Inc, Shanghai, China >> >> >>> >> >>> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> -- >> >> >>> >> Yizhi Liu >> >> >>> >> DMLC member >> >> >>> >> Technical Manager >> >> >>> >> Qihoo 360 Inc, Shanghai, China >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Yizhi Liu >> >> >>> DMLC member >> >> >>> Technical Manager >> >> >>> Qihoo 360 Inc, Shanghai, China >> >> >>> >> >> >> >> >> >> >> >> -- >> >> Yizhi Liu >> >> DMLC member >> >> Technical Manager >> >> Qihoo 360 Inc, Shanghai, China >> >> >> > >> > >> > >> > -- >> > Sandeep Krishnamurthy >> >> >> >> -- >> Yizhi Liu >> DMLC member >> Technical Manager >> Qihoo 360 Inc, Shanghai, China >>
