The reflection actually works. But you need to get the loader by `val loader = Thread.currentThread.getContextClassLoader` which is set by Spark executor. Our team verified this, and uses it as workaround.
Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> wrote: > Btw, I tried > > rdd.map { i => > System.getProperty("java.class.path") > }.collect() > > but didn't see the jars added via "--jars" on the executor classpath. > > -Xiangrui > > On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote: > > I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The > > reflection approach mentioned by DB didn't work either. I checked the > > distributed cache on a worker node and found the jar there. It is also > > in the Environment tab of the WebUI. The workaround is making an > > assembly jar. > > > > DB, could you create a JIRA and describe what you have found so far? > Thanks! > > > > Best, > > Xiangrui > > > > On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> > wrote: > >> Can you try moving your mapPartitions to another class/object which is > >> referenced only after sc.addJar ? > >> > >> I would suspect CNFEx is coming while loading the class containing > >> mapPartitions before addJars is executed. > >> > >> In general though, dynamic loading of classes means you use reflection > to > >> instantiate it since expectation is you don't know which implementation > >> provides the interface ... If you statically know it apriori, you > bundle it > >> in your classpath. > >> > >> Regards > >> Mridul > >> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: > >> > >>> Finally find a way out of the ClassLoader maze! It took me some times > to > >>> understand how it works; I think it worths to document it in a > separated > >>> thread. > >>> > >>> We're trying to add external utility.jar which contains > CSVRecordParser, > >>> and we added the jar to executors through sc.addJar APIs. > >>> > >>> If the instance of CSVRecordParser is created without reflection, it > >>> raises *ClassNotFound > >>> Exception*. > >>> > >>> data.mapPartitions(lines => { > >>> val csvParser = new CSVRecordParser((delimiter.charAt(0)) > >>> lines.foreach(line => { > >>> val lineElems = csvParser.parseLine(line) > >>> }) > >>> ... > >>> ... > >>> ) > >>> > >>> > >>> If the instance of CSVRecordParser is created through reflection, it > works. > >>> > >>> data.mapPartitions(lines => { > >>> val loader = Thread.currentThread.getContextClassLoader > >>> val CSVRecordParser = > >>> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") > >>> > >>> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) > >>> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) > >>> > >>> val parseLine = CSVRecordParser > >>> .getDeclaredMethod("parseLine", classOf[String]) > >>> > >>> lines.foreach(line => { > >>> val lineElems = parseLine.invoke(csvParser, > >>> line).asInstanceOf[Array[String]] > >>> }) > >>> ... > >>> ... > >>> ) > >>> > >>> > >>> This is identical to this question, > >>> > >>> > http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection > >>> > >>> It's not intuitive for users to load external classes through > reflection, > >>> but couple available solutions including 1) messing around > >>> systemClassLoader by calling systemClassLoader.addURI through > reflection or > >>> 2) forking another JVM to add jars into classpath before bootstrap > loader > >>> are very tricky. > >>> > >>> Any thought on fixing it properly? > >>> > >>> @Xiangrui, > >>> netlib-java jniloader is loaded from netlib-java through reflection, so > >>> this problem will not be seen. > >>> > >>> Sincerely, > >>> > >>> DB Tsai > >>> ------------------------------------------------------- > >>> My Blog: https://www.dbtsai.com > >>> LinkedIn: https://www.linkedin.com/in/dbtsai > >>> >