[
https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887674#comment-15887674
]
Lili Ma commented on HAWQ-1366:
-------------------------------
With the modified code, HAWQ throws error out.
{code}
postgres=# select * from tt;
ERROR: HAWQ does not support dictionary page type resolver for Parquet format
in column 'title' (cdbparquetcolumn.c:152) (seg0 localhost:40000 pid=90708)
{code}
> HAWQ should throw error if finding dictionary encoding type for Parquet
> -----------------------------------------------------------------------
>
> Key: HAWQ-1366
> URL: https://issues.apache.org/jira/browse/HAWQ-1366
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Storage
> Reporter: Lili Ma
> Assignee: Ed Espino
> Fix For: 2.2.0.0-incubating
>
>
> Since HAWQ is based on Parquet format version 1.0, which does not support
> dictionary page, and hawq register may register Parquet format version 2.0
> data into HAWQ, we should throw error if finding unsupported page for column.
> Reproduce Steps:
> 1. In Hive, create a table and insert into 8 records:
> {code}
> (hive> create table tt (i int,
> > fname varchar(100),
> > title varchar(100),
> > salary double
> > )
> > STORED AS PARQUET;
> OK
> Time taken: 0.029 seconds
> hive> insert into tt values (5, 'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',
> 'Sales', 80282.54),
> > (7, 'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE', 'Engineer', 10206.65),
> > (4, 'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ', 'Director', 63691.23),
> > (9, 'CTDCDYRURBZMBLNWHQNOQCYFFVULOP', 'Engineer', 63867.44),
> > (10, 'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK', 'Sales', 97720.08);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
> future versions. Consider using a different execution engine (i.e. spark,
> tez) or using Hive 1.X releases.
> Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0%
> Ended Job = job_local2046305831_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory
> hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-10000
> Loading data to table default.tt
> MapReduce Jobs Launched:
> Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.975 seconds
> hive> select * from tt;
> OK
> 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54
> 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer 10206.65
> 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director 63691.23
> 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer 63867.44
> 10 WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08
> Time taken: 0.056 seconds, Fetched: 5 row(s)
> {code}
> 2. Create table in HAWQ
> {code}
> CREATE TABLE public.tt
> (i int,
> fname varchar(100),
> title varchar(100),
> salary float8)
> WITH (appendonly=true,orientation=parquet);
> {code}
> 3. run hawq register
> {code}
> malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f
> hdfs://localhost:8020/user/hive/warehouse/tt tt
> 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try
> to connect database localhost:5432 postgres
> 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New
> file(s) to be registered:
> ['hdfs://localhost:8020/user/hive/warehouse/tt/000000_0']
> hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/000000_0
> hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
> 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq
> Register Succeed.
> {code}
> 4. select from hawq
> {code}
> postgres=# select * from tt;
> i | fname | title | salary
> ----+--------------------------------+-------+----------
> 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54
> 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65
> 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23
> 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44
> 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)