Finally find a way out of the ClassLoader maze! It took me some times to
understand how it works; I think it worths to document it in a separated
thread.

We're trying to add external utility.jar which contains CSVRecordParser,
and we added the jar to executors through sc.addJar APIs.

If the instance of CSVRecordParser is created without reflection, it
raises *ClassNotFound
Exception*.

data.mapPartitions(lines => {
    val csvParser = new CSVRecordParser((delimiter.charAt(0))
    lines.foreach(line => {
      val lineElems = csvParser.parseLine(line)
    })
    ...
    ...
 )


If the instance of CSVRecordParser is created through reflection, it works.

data.mapPartitions(lines => {
    val loader = Thread.currentThread.getContextClassLoader
    val CSVRecordParser =
        loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")

    val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
        .newInstance(delimiter.charAt(0).asInstanceOf[Character])

    val parseLine = CSVRecordParser
        .getDeclaredMethod("parseLine", classOf[String])

    lines.foreach(line => {
       val lineElems = parseLine.invoke(csvParser,
line).asInstanceOf[Array[String]]
    })
    ...
    ...
 )


This is identical to this question,
http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection

It's not intuitive for users to load external classes through reflection,
but couple available solutions including 1) messing around
systemClassLoader by calling systemClassLoader.addURI through reflection or
2) forking another JVM to add jars into classpath before bootstrap loader
are very tricky.

Any thought on fixing it properly?

@Xiangrui,
netlib-java jniloader is loaded from netlib-java through reflection, so
this problem will not be seen.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai

Reply via email to