[
https://issues.apache.org/jira/browse/HIVE-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qing Miao updated HIVE-22734:
-----------------------------
Description:
hi , I 'm a noob new one ...
but I use hive for some years ,
I create a table with one column as varhcar(6) with orc
an insert a multi-byte content in the table as below
hive> insert into mq1 values ('一二三四五六七') ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. spark, tez)
or using Hive 1.X releases.
Query ID = mq5445_20200116144748_cb87f769-9d3f-4b3b-b384-92c22b8ef06a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-01-16 14:47:52,024 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local484725283_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory
hdfs://wsl:9000/user/hive/warehouse/mq1/.hive-staging_hive_2020-01-16_14-47-48_936_2091348056955954494-1/-ext-10000
Loading data to table default.mq1
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 524 HDFS Write: 315 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 5.467 seconds
hive> select * from mq1 ;
OK
一二
一二
Time taken: 0.301 seconds, Fetched: 2 row(s)
hive> show create table mq1 ;
OK
CREATE TABLE `mq1`(
`col1` varchar(6))
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://wsl:9000/user/hive/warehouse/mq1'
TBLPROPERTIES (
'transient_lastDdlTime'='1579157273')
Time taken: 0.281 seconds, Fetched: 12 row(s)
It seems cannot store as six multi-byte word as mysql , for chinese in utf8 ,
it stored only 2 word for 3byte each in utf8 .
And in hive other format , for example , text format , parquet work well in
this situation .
My hive version is 2.3.6/2.2.0 for hadoop 2.7.0 ,orc cannot work well .
It seems that orc project fix some in version 1.6.2 and I just change the
orc-core-1.6.2.jar in the hive lib.
It does not work well either .
hive> insert into mq2 values ('一二三四五六七') ; hive> insert into mq2 values
('一二三四五六七') ; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be
available in the future versions. Consider using a different execution engine
(i.e. spark, tez) or using Hive 1.X releases.Query ID =
mq5445_20200116152037_0799cb92-b6d4-4e25-9544-b0213768217aTotal jobs =
3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no
reduce operator('一二三四五六七') ;Job running in-process (local Hadoop)SLF4J: Failed
to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to
no-operation (NOP) logger implementationSLF4J: See
http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.2020-01-16 15:20:40,127 Stage-1 map = 0%, reduce = 0%2020-01-16
15:20:41,137 Stage-1 map = 100%, reduce = 0%Ended Job =
job_local2085128098_0002Stage-4 is selected by condition resolver.Stage-3 is
filtered out by condition resolver.Stage-5 is filtered out by condition
resolver.Moving data to directory
hdfs://wsl:9000/user/hive/warehouse/mq2/.hive-staging_hive_2020-01-16_15-20-37_380_7016274963079907260-1/-ext-10000Loading
data to table default.mq2MapReduce Jobs Launched: Stage-Stage-1: HDFS Read:
1165 HDFS Write: 701 SUCCESSTotal MapReduce CPU Time Spent: 0 msecOKTime taken:
4.627 secondshive> select * from mq2 ;NoViableAltException(352@[]) at
org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36710)
at
org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987)
at
org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36920)
at
org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987)
at
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36633)
at
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
at
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
at
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468) at
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.util.RunJar.run(RunJar.java:244) at
org.apache.hadoop.util.RunJar.main(RunJar.java:158)FAILED: ParseException line
1:1 cannot recognize input near ''一二三四五六七'' ')' '<EOF>' in statementhive>
select * from mq2 ;OK一二三四五六Time taken: 0.536 seconds, Fetched: 1 row(s)
was:
hi , I 'm a noob new one ...
but I use hive for some years ,
I create a table with one column as varhcar(6) with orc
an insert a multi-byte content in the table as below
hive> insert into mq1 values ('一二三四五六七') ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. spark, tez)
or using Hive 1.X releases.
Query ID = mq5445_20200116144748_cb87f769-9d3f-4b3b-b384-92c22b8ef06a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-01-16 14:47:52,024 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local484725283_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory
hdfs://wsl:9000/user/hive/warehouse/mq1/.hive-staging_hive_2020-01-16_14-47-48_936_2091348056955954494-1/-ext-10000
Loading data to table default.mq1
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 524 HDFS Write: 315 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 5.467 seconds
hive> select * from mq1 ;
OK
一二
一二
Time taken: 0.301 seconds, Fetched: 2 row(s)
hive> show create table mq1 ;
OK
CREATE TABLE `mq1`(
`col1` varchar(6))
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://wsl:9000/user/hive/warehouse/mq1'
TBLPROPERTIES (
'transient_lastDdlTime'='1579157273')
Time taken: 0.281 seconds, Fetched: 12 row(s)
It seems cannot store as six multi-byte word as mysql , for chinese in utf8 ,
it stored only 2 word for 3byte each in utf8 .
And in hive other format , for example , text format , parquet work well in
this situation .
My hive version is 2.3.6/2.2.0 for hadoop 2.7.0 ,orc cannot work well .
It seems that orc project fix some in version 1.6.2 and I just change the
orc-core-1.6.2.jar in the hive lib.
It does not work well either .
> orc multi-byte character varchar type stored in some truncation
> ---------------------------------------------------------------
>
> Key: HIVE-22734
> URL: https://issues.apache.org/jira/browse/HIVE-22734
> Project: Hive
> Issue Type: Improvement
> Components: Database/Schema
> Affects Versions: 2.3.6
> Environment: unbuntu and centos7
>
> Reporter: Qing Miao
> Priority: Major
> Labels: hive, orc, utf-8
>
> hi , I 'm a noob new one ...
> but I use hive for some years ,
>
> I create a table with one column as varhcar(6) with orc
> an insert a multi-byte content in the table as below
>
>
> hive> insert into mq1 values ('一二三四五六七') ;
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
> future versions. Consider using a different execution engine (i.e. spark,
> tez) or using Hive 1.X releases.
> Query ID = mq5445_20200116144748_cb87f769-9d3f-4b3b-b384-92c22b8ef06a
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2020-01-16 14:47:52,024 Stage-1 map = 100%, reduce = 0%
> Ended Job = job_local484725283_0001
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory
> hdfs://wsl:9000/user/hive/warehouse/mq1/.hive-staging_hive_2020-01-16_14-47-48_936_2091348056955954494-1/-ext-10000
> Loading data to table default.mq1
> MapReduce Jobs Launched:
> Stage-Stage-1: HDFS Read: 524 HDFS Write: 315 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 5.467 seconds
> hive> select * from mq1 ;
> OK
> 一二
> 一二
> Time taken: 0.301 seconds, Fetched: 2 row(s)
> hive> show create table mq1 ;
> OK
> CREATE TABLE `mq1`(
> `col1` varchar(6))
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
> 'hdfs://wsl:9000/user/hive/warehouse/mq1'
> TBLPROPERTIES (
> 'transient_lastDdlTime'='1579157273')
> Time taken: 0.281 seconds, Fetched: 12 row(s)
>
> It seems cannot store as six multi-byte word as mysql , for chinese in utf8 ,
> it stored only 2 word for 3byte each in utf8 .
> And in hive other format , for example , text format , parquet work well in
> this situation .
> My hive version is 2.3.6/2.2.0 for hadoop 2.7.0 ,orc cannot work well .
> It seems that orc project fix some in version 1.6.2 and I just change the
> orc-core-1.6.2.jar in the hive lib.
> It does not work well either .
>
> hive> insert into mq2 values ('一二三四五六七') ; hive> insert into mq2 values
> ('一二三四五六七') ; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be
> available in the future versions. Consider using a different execution engine
> (i.e. spark, tez) or using Hive 1.X releases.Query ID =
> mq5445_20200116152037_0799cb92-b6d4-4e25-9544-b0213768217aTotal jobs =
> 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no
> reduce operator('一二三四五六七') ;Job running in-process (local Hadoop)SLF4J:
> Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to
> no-operation (NOP) logger implementationSLF4J: See
> http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.2020-01-16 15:20:40,127 Stage-1 map = 0%, reduce = 0%2020-01-16
> 15:20:41,137 Stage-1 map = 100%, reduce = 0%Ended Job =
> job_local2085128098_0002Stage-4 is selected by condition resolver.Stage-3 is
> filtered out by condition resolver.Stage-5 is filtered out by condition
> resolver.Moving data to directory
> hdfs://wsl:9000/user/hive/warehouse/mq2/.hive-staging_hive_2020-01-16_15-20-37_380_7016274963079907260-1/-ext-10000Loading
> data to table default.mq2MapReduce Jobs Launched: Stage-Stage-1: HDFS Read:
> 1165 HDFS Write: 701 SUCCESSTotal MapReduce CPU Time Spent: 0 msecOKTime
> taken: 4.627 secondshive> select * from mq2 ;NoViableAltException(352@[]) at
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36710)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36920)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36633)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
> at
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333) at
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468) at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at
> org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.util.RunJar.run(RunJar.java:244) at
> org.apache.hadoop.util.RunJar.main(RunJar.java:158)FAILED: ParseException
> line 1:1 cannot recognize input near ''一二三四五六七'' ')' '<EOF>' in
> statementhive> select * from mq2 ;OK一二三四五六Time taken: 0.536 seconds, Fetched:
> 1 row(s)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)