Steve Loughran commented on SPARK-21697:

# I don't see anything which can be done in HDFS here; it's in the libraries 
below it.
# Recent Hadoop releases => SLF4J, which *may* not have this problem. But as 
log4j looks for log4j.properties, and as dependent libraries may use 
commons-logging, there's no guarantee of that.

What to do?
# Classloader games: bring up the log infra then add the HDFS JARs to the CP. 
Maybe requires knowledge of what to force in before anything else. e.g: using 
new CP, do a stat of every JAR path, then inject them into the CP. Risky, as 
nobody really understands classpaths.
# Force D/L the remote artifact to local temp FS before execution, as YARN does 
itself. Do it for HFDS, WASB, S3x, ..., all filesystems known by Hadoop FS. 
(side issue, is there a way to enumerate this? Probably not, except for merging 
the list of service-discovered entries and those with an {{fs.SCHEMA.imp}} 

I think that' #2 is potentially the simplest and so most viable. It's not quite 
as elegant as saying "this is a supported URL you can directly use in the CP", 
but its the one that is going to avoid these problems

> NPE & ExceptionInInitializerError trying to load UTF from HDFS
> --------------------------------------------------------------
>                 Key: SPARK-21697
>                 URL: https://issues.apache.org/jira/browse/SPARK-21697
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1
>         Environment: Spark Client mode, Hadoop 2.6.0
>            Reporter: Steve Loughran
>            Priority: Minor
> Reported on [the 
> PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for 
> SPARK-12868: trying to load a UDF of HDFS is triggering an 
> {{ExceptionInInitializerError}}, caused by an NPE which should only happen if 
> the commons-logging {{LOG}} log is null.
> Hypothesis: the commons logging scan for {{commons-logging.properties}} is 
> happening in the classpath with the HDFS JAR; this is triggering a D/L of the 
> JAR, which needs to force in commons-logging, and, as that's not inited yet, 
> NPEs

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to