RE: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

Jyoti Ranjan Mahapatra Sun, 30 Sep 2018 23:45:20 -0700

Hi Jacek,

The issue might not be very widespread. I couldn’t reproduce it. Can you see if 
I am doing anything incorrect in the below queries?


scala> spark.range(10).write.saveAsTable("t1")

scala> spark.sql("describe formatted t1").show(100, false)
+----------------------------+-----------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                         
                                 |comment|
+----------------------------+-----------------------------------------------------------------------------------+-------+
|id                          |bigint                                            
                                 |null   |
|                            |                                                  
                                 |       |
|# Detailed Table Information|                                                  
                                 |       |
|Database                    |default                                           
                                 |       |
|Table                       |t1                                                
                                 |       |
|Owner                       |jyotima                                           
                                 |       |
|Created Time                |Sun Sep 30 23:40:46 PDT 2018                      
                                 |       |
|Last Access                 |Wed Dec 31 16:00:00 PST 1969                      
                                 |       |
|Created By                  |Spark 2.3.2                                       
                                 |       |
|Type                        |MANAGED                                           
                                 |       |
|Provider                    |parquet                                           
                                 |       |
|Table Properties            |[transient_lastDdlTime=1538376046]                
                                 |       |
|Statistics                  |3008 bytes                                        
                                 |       |
|Location                    
|file:/home/jyotima/repo/tmp/spark2.3.2/spark-2.3.2-bin-hadoop2.7/spark-warehouse/t1|
       |
|Serde Library               
|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe                    
    |       |
|InputFormat                 
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat                  
    |       |
|OutputFormat                
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat                 
    |       |
|Storage Properties          |[serialization.format=1]                          
                                 |       |
+----------------------------+-----------------------------------------------------------------------------------+-------+

scala> spark.version
res4: String = 2.3.2

Thanks,
Jyoti
From: Jacek Laskowski <ja...@japila.pl>
Sent: Sunday, September 30, 2018 11:28 PM
To: Sean Owen <sro...@gmail.com>
Cc: dev <dev@spark.apache.org>
Subject: Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

Hi Sean,

Thanks again for helping me to remain sane and that the issue is not imaginary 
:)

I'd expect to be spark-warehouse in the directory where spark-shell is executed 
(which is what has always been used for the metastore).

I'm reviewing all the changes between 2.3.1..2.3.2 to find anything relevant. 
I'm surprised nobody's reported it before. That worries me (or simply says that 
all the enterprise deployments simply use YARN with Hive?)

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428657573&sdata=9SJym%2B41JIxnZnRvtdBkGoV0DFl7YEBRK7ZTa1XsSMQ%3D&reserved=0>
Mastering Spark SQL 
https://bit.ly/mastering-spark-sql<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428667587&sdata=wewZO8MBXR9dM8zF1FGK%2FjXxlEOb%2FFqQc8LDKSBW66A%3D&reserved=0>
Spark Structured Streaming 
https://bit.ly/spark-structured-streaming<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428677578&sdata=TdX6tZltzBTn1vrB5N4ugqoshBD7qBks2Q1AW%2F%2Fq6ZQ%3D&reserved=0>
Mastering Kafka Streams 
https://bit.ly/mastering-kafka-streams<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428677578&sdata=P8iFWrIRG%2FdRo1FZs19vRUUvhQ09SnQ84Gs6pdEfsZc%3D&reserved=0>
Follow me at https://twitter.com/jaceklaskowski
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428687587&sdata=BnkxI99p9W8mNIERwyWPbaK%2FPSL2wCrK964phr2Jj%2B8%3D&reserved=0>


On Sun, Sep 30, 2018 at 10:25 PM Sean Owen 
<sro...@gmail.com<mailto:sro...@gmail.com>> wrote:
Hm, changes in the behavior of the default warehouse dir sound
familiar, but anything I could find was resolved well before 2.3.1
even. I don't know of a change here. What location are you expecting?
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12343289<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%3FprojectId%3D12315420%26version%3D12343289&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428687587&sdata=l7CmfR%2Fvyh%2BAiblQEfZS3bge94LI%2FFM8lkhoe90Tpnw%3D&reserved=0>
On Sun, Sep 30, 2018 at 1:38 PM Jacek Laskowski 
<ja...@japila.pl<mailto:ja...@japila.pl>> wrote:
>
> Hi Sean,
>
> I thought so too, but the path "file:/user/hive/warehouse/" should not have 
> been used in the first place, should it? I'm running it in spark-shell 2.3.2. 
> Why would there be any changes between 2.3.1 and 2.3.2 that I just downloaded 
> and one worked fine while the other did not? I had to downgrade to 2.3.1 
> because of this (and do want to figure out why 2.3.2 behaves in a different 
> way).
>
> The part of the stack trace is below.
>
> ➜  spark-2.3.2-bin-hadoop2.7 ./bin/spark-shell
> 2018-09-30 17:43:49 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Spark context Web UI available at 
> http://192.168.0.186:4040<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.0.186%3A4040&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428697597&sdata=cU088Lu1jh6hmEDqCIU8RQuEjd%2FBj94XMtXicOGJ8ig%3D&reserved=0>
> Spark context available as 'sc' (master = local[*], app id = 
> local-1538322235135).
> Spark session available as 'spark'.
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
>       /_/
>
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> spark.version
> res0: String = 2.3.2
>
> scala> spark.range(1).write.saveAsTable("demo")
> 2018-09-30 17:44:27 WARN  ObjectStore:568 - Failed to get database 
> global_temp, returning NoSuchObjectException
> 2018-09-30 17:44:28 ERROR FileOutputCommitter:314 - Mkdirs failed to create 
> file:/user/hive/warehouse/demo/_temporary/0
> 2018-09-30 17:44:28 ERROR Utils:91 - Aborting task
> java.io.IOException: Mkdirs failed to create 
> file:/user/hive/warehouse/demo/_temporary/0/_temporary/attempt_20180930174428_0000_m_000007_0
>  (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:241)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428707602&sdata=O%2BMlDzwBM1wmQQtM8CDUx4j7yaOlU5sxES1Vub29lUc%3D&reserved=0>
> Mastering Spark SQL 
> https://bit.ly/mastering-spark-sql<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428717611&sdata=P7KNdK6ZMgDKWnGNC6UU8vbIg8QqC62walKHbWl98Ao%3D&reserved=0>
> Spark Structured Streaming 
> https://bit.ly/spark-structured-streaming<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428717611&sdata=qd0w7zXD670hfnmBuKoUUd83mkxkMxUgicNSTqDSBfo%3D&reserved=0>
> Mastering Kafka Streams 
> https://bit.ly/mastering-kafka-streams<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428727616&sdata=fegxJX9f3KzKGFt0%2FcbQtLX1BtnQYYBGXstvb7b1agk%3D&reserved=0>
> Follow me at 
> https://twitter.com/jaceklaskowski<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428737625&sdata=Eh4I86koKzbCMv90H%2FxTfTljYbRbjXZWUyfSCFR2bb0%3D&reserved=0>
>
>
> On Sat, Sep 29, 2018 at 9:50 PM Sean Owen 
> <sro...@gmail.com<mailto:sro...@gmail.com>> wrote:
>>
>> Looks like a permission issue? Are you sure that isn't the difference, first?
>>
>> On Sat, Sep 29, 2018, 1:54 PM Jacek Laskowski 
>> <ja...@japila.pl<mailto:ja...@japila.pl>> wrote:
>>>
>>> Hi,
>>>
>>> The following query fails in 2.3.2:
>>>
>>> scala> spark.range(10).write.saveAsTable("t1")
>>> ...
>>> 2018-09-29 20:48:06 ERROR FileOutputCommitter:314 - Mkdirs failed to create 
>>> file:/user/hive/warehouse/bucketed/_temporary/0
>>> 2018-09-29 20:48:07 ERROR Utils:91 - Aborting task
>>> java.io.IOException: Mkdirs failed to create 
>>> file:/user/hive/warehouse/bucketed/_temporary/0/_temporary/attempt_20180929204807_0000_m_000003_0
>>>  (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
>>> at 
>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
>>> at 
>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
>>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>>>
>>> While it works fine in 2.3.1.
>>>
>>> Could anybody explain the change in behaviour in 2.3.2? The commit / the 
>>> JIRA issue would be even nicer. Thanks.
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428737625&sdata=YGfX2WeRaY0mpq94HoRyVs1R0zrD%2Fi9ufJhVOwRN8%2B8%3D&reserved=0>
>>> Mastering Spark SQL 
>>> https://bit.ly/mastering-spark-sql<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428747635&sdata=0i%2F%2FH9w1waMlvGwArLjPAHf1eoDgKzuHxuGLhv8Vcyc%3D&reserved=0>
>>> Spark Structured Streaming 
>>> https://bit.ly/spark-structured-streaming<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428757639&sdata=LmF7VqzTRbenqYVPxbOSGBFHcnFxSVSrR4cwFUfuFo8%3D&reserved=0>
>>> Mastering Kafka Streams 
>>> https://bit.ly/mastering-kafka-streams<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428767649&sdata=c9Jjm9%2B9oMxzJJQjSYmafmrdS6BQ%2B1eeHJqbtfCi4y0%3D&reserved=0>
>>> Follow me at 
>>> https://twitter.com/jaceklaskowski<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428767649&sdata=tiGbxDP%2BkEsryIV0uFTB7u2h3pgzPmUGtSl1RHmUmD8%3D&reserved=0>

RE: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

Reply via email to