[GitHub] [iceberg] pvary commented on pull request #1407: Hive: HiveIcebergOutputFormat first implementation for handling Hive inserts into unpartitioned Iceberg tables

GitBox Thu, 17 Sep 2020 06:36:42 -0700


pvary commented on pull request #1407:
URL: https://github.com/apache/iceberg/pull/1407#issuecomment-694239063



   We had a chat with @massdosage, and found the following.
   
   The CREATE TABLE command expects that the **table is already there**. It 
will not create the actual Iceberg table metadata. That is in the upcoming 
patch where we implement the HiveMetaHook.
   
   Also for writes the **execution engine is set to MR**. Tez/LLAP needs some 
more work on their side to call the appropriate callbacks on the 
OutputCommitter in time.
   
   @massdosage: Run another test with the correct settings on an arbitrary 
table (not sure about the schema/create script etc.), and it failed with the 
following exception:
   ```
   Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"tmp_values_col1":"f4pyx-ngNrU4YcHcdZco2CBPBa7!Kz7Zg4UNt.nmB1ximyTK","tmp_values_col2":"7604075334510954017"}
          at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error while processing row 
{"tmp_values_col1":"f4pyx-ngNrU4YcHcdZco2CBPBa7!Kz7Zg4UNt.nmB1ximyTK","tmp_values_col2":"7604075334510954017"}
          at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
          at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
          ... 8 more
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
          at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:574)
          at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:674)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
          at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
          at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
          at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
          at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
          ... 9 more
   ```
   
   Which is very strange that we are creating buckets in the FSOperator when we 
are supposed to use our very own SerDe.
   
   I will share my development branch which includes the table creation patch, 
so it would be easier to create arbitrary test tables, in the meantime 
@massdosage will try to dig up a little history about the table which caused us 
headaches 😄 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on pull request #1407: Hive: HiveIcebergOutputFormat first implementation for handling Hive inserts into unpartitioned Iceberg tables

Reply via email to