Subhankar created SPARK-23338:
---------------------------------
Summary: Spark unable to run on HDP deployed Azure Blob File System
Key: SPARK-23338
URL: https://issues.apache.org/jira/browse/SPARK-23338
Project: Spark
Issue Type: Bug
Components: Spark Core, Spark Shell
Affects Versions: 2.2.0
Environment: HDP 2.6.0.3
Spark2 2.2.0
HDFS 2.7.3
CentOS 7.1
Reporter: Subhankar
Hello,
It is impossible to run Spark on the BLOB storage file system deployed on HDP.
I am unable to run Spark as it is giving errors related to HiveSessionState,
HiveExternalCatalog and various Azure File storage exceptions.
I request you to kindly help in case you have a suggestion to address this. Or
is it that my exercise is futile and Spark is not configured to run on BLOB
storage after all.
Thanks in advance.
Detailed Description:
h5. *We are unable to access spark/spark2 when we change the file system
storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop
2.7.3. All other services are working fine.*
I have set the following configurations:
*HDFS*:
core-site-
fs.defaultFS =
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]
fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb
fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs
fs.azure.selfthrottling.read.factor = 1.0
fs.azure.selfthrottling.write.factor = 1.0
[fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/]
= KEY
[spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/]
= KEY
*SPARK2:*
spark.eventLog.dir =
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
spark.history.fs.logDirectory =
wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
In spite of trying multiple times and irrespective of alternative
configurations, the *spark-shell* command is yielding the below results:
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState':
at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
at
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at
org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException:
java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 63 more
Caused by: java.lang.reflect.InvocationTargetException:
java.lang.reflect.InvocationTargetException: java.lang.RuntimeException:
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An
error occurred while enumerating the result, check the original exception for
details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
... 71 more
Caused by: java.lang.reflect.InvocationTargetException:
java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException:
java.util.NoSuchElementException: An error occurred while enumerating the
result, check the original exception for details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
at
org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
... 76 more
Caused by: java.lang.RuntimeException:
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An
error occurred while enumerating the result, check the original exception for
details.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 84 more
Caused by: org.apache.hadoop.fs.azure.AzureException:
java.util.NoSuchElementException: An error occurred while enumerating the
result, check the original exception for details.
at
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027)
at
org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at
org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137)
at
org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
at
org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 85 more
Caused by: java.util.NoSuchElementException: An error occurred while
enumerating the result, check the original exception for details.
at
com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
at
org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130)
at
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006)
... 93 more
Caused by: com.microsoft.azure.storage.StorageException: The server encountered
an unknown failure: OK
at
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101)
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199)
at
com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
... 95 more
Caused by: java.lang.ClassCastException:
org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to
org.apache.xerces.xni.parser.XMLParserConfiguration
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source)
at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668)
at
com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72)
at
com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284)
at
com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248)
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
... 96 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
It would be immensely helpful if anyone could assist in resolving the above. It
may happen that we have missed out on configuring an important aspect of HDFS
or Spark, as a result of which it is unable to locate certain JARS and is
getting incompatible with the BLOB storage.
Kindly assist !
PS: I have made sure the required jars of azure-storage and Hadoop-azure are
made available in the spark and the Hadoop lib folders. I have even tried to
specify the same explicitly when starting spark-shell but to no effect.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]