[GitHub] [iceberg] lcaaaat commented on issue #954: Default warehouse location of a table should be a subdirectory in database location

GitBox Thu, 23 Jul 2020 19:50:25 -0700


lcaaaat commented on issue #954:
URL: https://github.com/apache/iceberg/issues/954#issuecomment-663324572



   @pvary  Thanks for your reply!
   
   For some reason, I don't want to write data into **/user/hive/warehouse**, 
so, I create the database by sql:
   
   ```scala
   spark.sql("create database iceberg_test location 
'/user/data_transform/iceberg_test'")
   ```
   
   As the result, we can see that:
   
   ```scala
   spark.sql("desc database iceberg_test").show(false)
   +-------------------------+--------------------------------------------+
   |database_description_item|database_description_value                  |
   +-------------------------+--------------------------------------------+
   |Database Name            |iceberg_test                                |
   |Description              |                                            |
   |Location                 |hdfs://dev4/user/data_transform/iceberg_test|
   +-------------------------+--------------------------------------------+
   ```
   
   Then I create a table in hive format and describe it: 
   
   ```scala
   spark.sql("create table iceberg_test.hive_table(name string)")
   
   spark.sql("describe FORMATTED iceberg_test.hive_table").show(false)
   
+----------------------------+----------------------------------------------------------+-------+
   |col_name                    |data_type                                      
           |comment|
   
+----------------------------+----------------------------------------------------------+-------+
   |name                        |string                                         
           |null   |
   |                            |                                               
           |       |
   |# Detailed Table Information|                                               
           |       |
   |Database                    |iceberg_test                                   
           |       |
   |Table                       |hive_table                                     
           |       |
   |Owner                       |data_transform/[email protected]        
            |       |
   |Created Time                |Fri Jul 24 10:24:24 CST 2020                   
           |       |
   |Last Access                 |Thu Jan 01 08:00:00 CST 1970                   
           |       |
   |Created By                  |Spark 2.4.5                                    
           |       |
   |Type                        |MANAGED                                        
           |       |
   |Provider                    |hive                                           
           |       |
   |Table Properties            |[transient_lastDdlTime=1595557464]             
           |       |
   |Location                    
|hdfs://dev4/user/data_transform/iceberg_test/hive_table   |       |
   |Serde Library               
|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe        |       |
   |InputFormat                 |org.apache.hadoop.mapred.TextInputFormat       
           |       |
   |OutputFormat                
|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|       |
   |Storage Properties          |[serialization.format=1]                       
           |       |
   |Partition Provider          |Catalog                                        
           |       |
   
+----------------------------+----------------------------------------------------------+-------+
   ```
   
   We can see that the location of table is 
**hdfs://dev4/user/data_transform/iceberg_test/hive_table**, a subdirectory 
under **iceberg_test**'s location.
   
   However, if I create a table in iceberg format and print it's location:
   
   ```scala
   val hiveCatalog = HiveCatalogs.loadCatalog(new HiveConf())
   val tableIdentifier = TableIdentifier.parse("iceberg_test.iceberg_table")
   val schema = new Schema(NestedField.optional(2, "name", StringType.get()))
   val table = hiveCatalog.createTable(tableIdentifier, schema)
   println(table.location())
   
   /user/warehouse/iceberg_test.db/iceberg_table
   ```
   
   We can find there is a difference about the tables' location between two 
format.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] lcaaaat commented on issue #954: Default warehouse location of a table should be a subdirectory in database location

Reply via email to