[
https://issues.apache.org/jira/browse/HUDI-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436809#comment-17436809
]
Rowen edited comment on HUDI-2649 at 11/3/21, 2:43 AM:
-------------------------------------------------------
版本情况说明:
flink1.13.1+scala2.11+CDH6.2.0 Hadoop3.0.0+Hive2.1.1+hudi0.10
flinksql hudi hive 表 DDL:
Flink SQL> CREATE TABLE luo_sync_hive03(
> id bigint ,
> name string,
> birthday TIMESTAMP(3),
> ts TIMESTAMP(3),
> `partition` VARCHAR(20),
> primary key(id) not enforced --必须指定uuid 主键
> )
> PARTITIONED BY (`partition`)
> with(
> 'connector'='hudi',
> 'path' = 'hdfs://nameservice/tmp/luo/hudi/luo_sync_hive03'
> , 'hoodie.datasource.write.recordkey.field' = 'id' -- 主键
> , 'write.precombine.field' = 'ts' -- 可自动precombine字段merge
> , 'write.tasks' = '1'
> , 'compaction.tasks' = '1'
> , 'write.rate.limit' = '2000' -- 限速
> , 'table.type' = 'MERGE_ON_READ' -- 默认COPY_ON_WRITE,可选MERGE_ON_READ
> , 'compaction.async.enable' = 'true' -- 是否开启异步开启
> , 'compaction.trigger.strategy' = 'num_commits' -- 按次数压缩
> , 'compaction.delta_commits' = '5' -- 默认为5
> , 'changelog.enable' = 'true' -- 支持消费所有变更
> , 'read.streaming.enable' = 'true' -- 开启流读
> , 'read.streaming.check-interval' = '4' -- 检查间隔,默认60s
> , 'hive_sync.enable' = 'true' -- 启用hive同步
> , 'hive_sync.metastore.uris' = 'thrift://hadoop01:9083' -- metastore地址
> , 'hive_sync.jdbc_url' = 'jdbc:hive2://hadoop03:10000' -- hiveServer地址
> , 'hive_sync.table' = 'luo_sync_hive03' -- hive表名
> , 'hive_sync.db' = 'luo' -- hive 数据库
> , 'hive_sync.username' = '' -- required, HMS 用户名
> , 'hive_sync.password' = '' -- required, HMS 密码
>
hive shell 查询hudi MOR表数据:
其中:
select * from luo_sync_hive03_ro;
select * from luo_sync_hive03_rt;
select count(1) from luo_sync_hive03_ro;
异常三个查询都正常。
_rt 表count 异常如下:
hive> select count(1) from luo_sync_hive03_rt;
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
21/11/01 20:04:10 INFO exec.Task: Hadoop job information for Stage-1: number of
mappers: 1; number of reducers: 1
21/11/01 20:04:10 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2021-11-01 20:04:10,933 Stage-1 map = 0%, reduce = 0%
21/11/01 20:04:10 INFO exec.Task: 2021-11-01 20:04:10,933 Stage-1 map = 0%,
reduce = 0%
2021-11-01 20:04:28,231 Stage-1 map = 100%, reduce = 100%
21/11/01 20:04:28 INFO exec.Task: 2021-11-01 20:04:28,231 Stage-1 map = 100%,
reduce = 100%
Ended Job = job_1626256835287_37721 with errors
21/11/01 20:04:30 ERROR exec.Task: Ended Job = job_1626256835287_37721 with
errors
Error during job, obtaining debugging information...
21/11/01 20:04:30 ERROR exec.Task: Error during job, obtaining debugging
information...
Examining task ID: task_1626256835287_37721_m_000000 (and more) from job
job_1626256835287_37721
21/11/01 20:04:30 ERROR exec.Task: Examining task ID:
task_1626256835287_37721_m_000000 (and more) from job job_1626256835287_37721
21/11/01 20:04:30 WARN shims.HadoopShimsSecure: Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
21/11/01 20:04:30 WARN shims.HadoopShimsSecure: Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
21/11/01 20:04:30 WARN shims.HadoopShimsSecure: Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
21/11/01 20:04:30 WARN shims.HadoopShimsSecure: Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
Task with the most failures(4):
-----
Task ID:
task_1626256835287_37721_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1626256835287_37721&tipid=task_1626256835287_37721_m_000000
-----
Diagnostic Messages for this Task:
Error:
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
21/11/01 20:04:30 ERROR exec.Task:
Task with the most failures(4):
-----
Task ID:
task_1626256835287_37721_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1626256835287_37721&tipid=task_1626256835287_37721_m_000000
-----
Diagnostic Messages for this Task:
Error:
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
21/11/01 20:04:30 INFO impl.YarnClientImpl: Killed application
application_1626256835287_37721
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
21/11/01 20:04:30 ERROR ql.Driver: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
21/11/01 20:04:30 INFO ql.Driver: MapReduce Jobs Launched:
21/11/01 20:04:30 WARN mapreduce.Counters: Group FileSystemCounters is
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
21/11/01 20:04:30 INFO ql.Driver: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0
HDFS Write: 0 HDFS EC Read: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
21/11/01 20:04:30 INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec
21/11/01 20:04:30 INFO ql.Driver: Completed executing
command(queryId=root_20211101200404_046c4801-2600-4375-abf4-1a3fb6483965); Time
taken: 25.609 seconds
21/11/01 20:04:30 INFO conf.HiveConf: Using the default value passed in for log
id: cf1b04fb-fec6-42a7-b26a-8eaedee7c04f
21/11/01 20:04:30 INFO session.SessionState: Resetting thread name to main
was (Author: rowen):
!image-2021-11-01-20-49-12-169.png!!image-2021-11-01-20-49-31-530.png!!image-2021-11-01-20-51-37-412.png!
> Kick off all the Hive query issues for 0.10.0
> ---------------------------------------------
>
> Key: HUDI-2649
> URL: https://issues.apache.org/jira/browse/HUDI-2649
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Flink Integration
> Reporter: Danny Chen
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)