pvary commented on pull request #1407:
URL: https://github.com/apache/iceberg/pull/1407#issuecomment-694239063
We had a chat with @massdosage, and found the following.
The CREATE TABLE command expects that the **table is already there**. It
will not create the actual Iceberg table metadata. That is in the upcoming
patch where we implement the HiveMetaHook.
Also for writes the **execution engine is set to MR**. Tez/LLAP needs some
more work on their side to call the appropriate callbacks on the
OutputCommitter in time.
@massdosage: Run another test with the correct settings on an arbitrary
table (not sure about the schema/create script etc.), and it failed with the
following exception:
```
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{"tmp_values_col1":"f4pyx-ngNrU4YcHcdZco2CBPBa7!Kz7Zg4UNt.nmB1ximyTK","tmp_values_col2":"7604075334510954017"}
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{"tmp_values_col1":"f4pyx-ngNrU4YcHcdZco2CBPBa7!Kz7Zg4UNt.nmB1ximyTK","tmp_values_col2":"7604075334510954017"}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:574)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:674)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
... 9 more
```
Which is very strange that we are creating buckets in the FSOperator when we
are supposed to use our very own SerDe.
I will share my development branch which includes the table creation patch,
so it would be easier to create arbitrary test tables, in the meantime
@massdosage will try to dig up a little history about the table which caused us
headaches 😄
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]