Subhankar created SPARK-23338:
---------------------------------

             Summary: Spark unable to run on HDP deployed Azure Blob File System
                 Key: SPARK-23338
                 URL: https://issues.apache.org/jira/browse/SPARK-23338
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Spark Shell
    Affects Versions: 2.2.0
         Environment: HDP 2.6.0.3

Spark2 2.2.0

HDFS 2.7.3

CentOS 7.1
            Reporter: Subhankar


Hello,

It is impossible to run Spark on the BLOB storage file system deployed on HDP.
I am unable to run Spark as it is giving errors related to HiveSessionState, 
HiveExternalCatalog and various Azure File storage exceptions.
I request you to kindly help in case you have a suggestion to address this. Or 
is it that my exercise is futile and Spark is not configured to run on BLOB 
storage after all.

Thanks in advance.

 

Detailed Description:

 
h5. *We are unable to access spark/spark2 when we change the file system 
storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 
2.7.3. All other services are working fine.*

I have set the following configurations:

*HDFS*:

core-site-

fs.defaultFS = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]

fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb

fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs

fs.azure.selfthrottling.read.factor = 1.0

fs.azure.selfthrottling.write.factor = 1.0

[fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/]
 = KEY

[spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/]
 = KEY


*SPARK2:*

spark.eventLog.dir = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/

spark.history.fs.logDirectory = 
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/

In spite of trying multiple times and irrespective of alternative 
configurations, the *spark-shell* command is yielding the below results:

$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveSessionState':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: 
java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveExternalCatalog':
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveExternalCatalog':
at 
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 63 more
Caused by: java.lang.reflect.InvocationTargetException: 
java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: 
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An 
error occurred while enumerating the result, check the original exception for 
details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
... 71 more
Caused by: java.lang.reflect.InvocationTargetException: 
java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: 
java.util.NoSuchElementException: An error occurred while enumerating the 
result, check the original exception for details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
... 76 more
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An 
error occurred while enumerating the result, check the original exception for 
details.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 84 more
Caused by: org.apache.hadoop.fs.azure.AzureException: 
java.util.NoSuchElementException: An error occurred while enumerating the 
result, check the original exception for details.
at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at 
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
at 
org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 85 more
Caused by: java.util.NoSuchElementException: An error occurred while 
enumerating the result, check the original exception for details.
at 
com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
at 
org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130)
at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006)
... 93 more
Caused by: com.microsoft.azure.storage.StorageException: The server encountered 
an unknown failure: OK
at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199)
at 
com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
... 95 more
Caused by: java.lang.ClassCastException: 
org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to 
org.apache.xerces.xni.parser.XMLParserConfiguration
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source)
at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668)
at 
com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72)
at 
com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284)
at 
com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
... 96 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql

 

 

 

It would be immensely helpful if anyone could assist in resolving the above. It 
may happen that we have missed out on configuring an important aspect of HDFS 
or Spark, as a result of which it is unable to locate certain JARS and is 
getting incompatible with the BLOB storage.

Kindly assist !

PS: I have made sure the required jars of azure-storage and Hadoop-azure are 
made available in the spark and the Hadoop lib folders. I have even tried to 
specify the same explicitly when starting spark-shell but to no effect.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to