Hi Patrick,

If spark-submit works correctly, user only needs to specify runtime
jars via `--jars` instead of using `sc.addJar`. Is it correct? I
checked SparkSubmit and yarn.Client but didn't find any code to handle
`args.jars` for YARN mode. So I don't know where in the code the jars
in the distributed cache are added to runtime classpath on executors.

Best,
Xiangrui

On Sun, May 18, 2014 at 11:58 AM, Patrick Wendell <pwend...@gmail.com> wrote:
> @db - it's possible that you aren't including the jar in the classpath
> of your driver program (I think this is what mridul was suggesting).
> It would be helpful to see the stack trace of the CNFE.
>
> - Patrick
>
> On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>> @xiangrui - we don't expect these to be present on the system
>> classpath, because they get dynamically added by Spark (e.g. your
>> application can call sc.addJar well after the JVM's have started).
>>
>> @db - I'm pretty surprised to see that behavior. It's definitely not
>> intended that users need reflection to instantiate their classes -
>> something odd is going on in your case. If you could create an
>> isolated example and post it to the JIRA, that would be great.
>>
>> On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870
>>>
>>> DB, could you add more info to that JIRA? Thanks!
>>>
>>> -Xiangrui
>>>
>>> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>>> Btw, I tried
>>>>
>>>> rdd.map { i =>
>>>>   System.getProperty("java.class.path")
>>>> }.collect()
>>>>
>>>> but didn't see the jars added via "--jars" on the executor classpath.
>>>>
>>>> -Xiangrui
>>>>
>>>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
>>>>> reflection approach mentioned by DB didn't work either. I checked the
>>>>> distributed cache on a worker node and found the jar there. It is also
>>>>> in the Environment tab of the WebUI. The workaround is making an
>>>>> assembly jar.
>>>>>
>>>>> DB, could you create a JIRA and describe what you have found so far? 
>>>>> Thanks!
>>>>>
>>>>> Best,
>>>>> Xiangrui
>>>>>
>>>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> 
>>>>> wrote:
>>>>>> Can you try moving your mapPartitions to another class/object which is
>>>>>> referenced only after sc.addJar ?
>>>>>>
>>>>>> I would suspect CNFEx is coming while loading the class containing
>>>>>> mapPartitions before addJars is executed.
>>>>>>
>>>>>> In general though, dynamic loading of classes means you use reflection to
>>>>>> instantiate it since expectation is you don't know which implementation
>>>>>> provides the interface ... If you statically know it apriori, you bundle 
>>>>>> it
>>>>>> in your classpath.
>>>>>>
>>>>>> Regards
>>>>>> Mridul
>>>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote:
>>>>>>
>>>>>>> Finally find a way out of the ClassLoader maze! It took me some times to
>>>>>>> understand how it works; I think it worths to document it in a separated
>>>>>>> thread.
>>>>>>>
>>>>>>> We're trying to add external utility.jar which contains CSVRecordParser,
>>>>>>> and we added the jar to executors through sc.addJar APIs.
>>>>>>>
>>>>>>> If the instance of CSVRecordParser is created without reflection, it
>>>>>>> raises *ClassNotFound
>>>>>>> Exception*.
>>>>>>>
>>>>>>> data.mapPartitions(lines => {
>>>>>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
>>>>>>>     lines.foreach(line => {
>>>>>>>       val lineElems = csvParser.parseLine(line)
>>>>>>>     })
>>>>>>>     ...
>>>>>>>     ...
>>>>>>>  )
>>>>>>>
>>>>>>>
>>>>>>> If the instance of CSVRecordParser is created through reflection, it 
>>>>>>> works.
>>>>>>>
>>>>>>> data.mapPartitions(lines => {
>>>>>>>     val loader = Thread.currentThread.getContextClassLoader
>>>>>>>     val CSVRecordParser =
>>>>>>>         loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
>>>>>>>
>>>>>>>     val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
>>>>>>>         .newInstance(delimiter.charAt(0).asInstanceOf[Character])
>>>>>>>
>>>>>>>     val parseLine = CSVRecordParser
>>>>>>>         .getDeclaredMethod("parseLine", classOf[String])
>>>>>>>
>>>>>>>     lines.foreach(line => {
>>>>>>>        val lineElems = parseLine.invoke(csvParser,
>>>>>>> line).asInstanceOf[Array[String]]
>>>>>>>     })
>>>>>>>     ...
>>>>>>>     ...
>>>>>>>  )
>>>>>>>
>>>>>>>
>>>>>>> This is identical to this question,
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
>>>>>>>
>>>>>>> It's not intuitive for users to load external classes through 
>>>>>>> reflection,
>>>>>>> but couple available solutions including 1) messing around
>>>>>>> systemClassLoader by calling systemClassLoader.addURI through 
>>>>>>> reflection or
>>>>>>> 2) forking another JVM to add jars into classpath before bootstrap 
>>>>>>> loader
>>>>>>> are very tricky.
>>>>>>>
>>>>>>> Any thought on fixing it properly?
>>>>>>>
>>>>>>> @Xiangrui,
>>>>>>> netlib-java jniloader is loaded from netlib-java through reflection, so
>>>>>>> this problem will not be seen.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>>
>>>>>>> DB Tsai
>>>>>>> -------------------------------------------------------
>>>>>>> My Blog: https://www.dbtsai.com
>>>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>>>>>

Reply via email to