[
https://issues.apache.org/jira/browse/HIVE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288021#comment-14288021
]
Sergey Shelukhin commented on HIVE-9445:
----------------------------------------
I didn't see either of these issues on my testing though, DATE_FORMAT should
return NULL for invalid values (must be Mysql version/some mode difference),
which should cause it to retain value. What version of Postgres was this? Can
you run function manually on these values? It's hard to understand how this
conversion would happen.
Meanwhile, revert makes sense.
> Revert HIVE-5700 - schemaTool failure when date partition has non-date value
> ----------------------------------------------------------------------------
>
> Key: HIVE-9445
> URL: https://issues.apache.org/jira/browse/HIVE-9445
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0, 0.14.1
> Reporter: Brock Noland
> Assignee: Brock Noland
> Priority: Blocker
> Attachments: HIVE-9445.1.patch
>
>
> HIVE-5700 has the following issues:
> * HIVE-8730 - fails mysql upgrades
> * Does not upgrade all metadata, e.g. {{PARTITIONS.PART_NAME}} See comments
> in HIVE-5700.
> * Completely corrupts postgres, see below.
> With a postgres metastore on 0.12, I executed the following:
> {noformat}
> CREATE TABLE HIVE5700_DATE_PARTED (line string) PARTITIONED BY (ddate date);
> CREATE TABLE HIVE5700_STRING_PARTED (line string) PARTITIONED BY (ddate
> string);
> ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='NOT_DATE');
> ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150121');
> ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150122');
> ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='2015-01-23');
> ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='NOT_DATE');
> ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150121');
> ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150122');
> ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='2015-01-23');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_DATE_PARTED PARTITION (ddate='NOT_DATE');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_DATE_PARTED PARTITION (ddate='20150121');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_DATE_PARTED PARTITION (ddate='20150122');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_DATE_PARTED PARTITION (ddate='2015-01-23');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_STRING_PARTED PARTITION (ddate='NOT_DATE');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_STRING_PARTED PARTITION (ddate='20150121');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_STRING_PARTED PARTITION (ddate='20150122');
> LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE
> HIVE5700_STRING_PARTED PARTITION (ddate='2015-01-23');
> hive> show partitions HIVE5700_DATE_PARTED;
> OK
> ddate=20150121
> ddate=20150122
> ddate=2015-01-23
> ddate=NOT_DATE
> Time taken: 0.052 seconds, Fetched: 4 row(s)
> hive> show partitions HIVE5700_STRING_PARTED;
> OK
> ddate=20150121
> ddate=20150122
> ddate=2015-01-23
> ddate=NOT_DATE
> Time taken: 0.051 seconds, Fetched: 4 row(s)
> {noformat}
> I then took a dump of the database named {{postgres-pre-upgrade.sql}} and the
> data in the dump looks good:
> {noformat}
> [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY
> "PARTITION_KEY_VALS"' postgres-pre-upgrade.sql
> COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME",
> "SD_ID", "TBL_ID") FROM stdin;
> 3 1421943647 0 ddate=NOT_DATE 6 2
> 4 1421943647 0 ddate=20150121 7 2
> 5 1421943648 0 ddate=20150122 8 2
> 6 1421943664 0 ddate=NOT_DATE 9 3
> 7 1421943664 0 ddate=20150121 10 3
> 8 1421943665 0 ddate=20150122 11 3
> 9 1421943694 0 ddate=2015-01-23 12 2
> 10 1421943695 0 ddate=2015-01-23 13 3
> \.
> --
> COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM
> stdin;
> 3 NOT_DATE 0
> 4 20150121 0
> 5 20150122 0
> 6 NOT_DATE 0
> 7 20150121 0
> 8 20150122 0
> 9 2015-01-23 0
> 10 2015-01-23 0
> \.
> {noformat}
> I then upgraded to 0.13 and subsequently upgraded the MS with the following
> command: {{schematool -dbType postgres -upgradeSchema -verbose}}
> The file {{postgres-post-upgrade.sql}} is the post-upgrade db dump. As you
> can see the data is completely corrupt.
> {noformat}
> [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY
> "PARTITION_KEY_VALS"' postgres-post-upgrade.sql
> COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME",
> "SD_ID", "TBL_ID") FROM stdin;
> 3 1421943647 0 ddate=NOT_DATE 6 2
> 4 1421943647 0 ddate=20150121 7 2
> 5 1421943648 0 ddate=20150122 8 2
> 6 1421943664 0 ddate=NOT_DATE 9 3
> 7 1421943664 0 ddate=20150121 10 3
> 8 1421943665 0 ddate=20150122 11 3
> 9 1421943694 0 ddate=2015-01-23 12 2
> 10 1421943695 0 ddate=2015-01-23 13 3
> \.
> --
> COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM
> stdin;
> 3 2015-01-23 0
> 4 2015-01-23 0
> 5 2015-01-23 0
> 6 2015-01-23 0
> 7 2015-01-23 0
> 8 2015-01-23 0
> 9 2015-01-23 0
> 10 2015-01-23 0
> \.
> {noformat}
> Same corruption from the HIVE CLI:
> {noformat}
> hive> show partitions HIVE5700_DATE_PARTED;
> OK
> ddate=20150121
> ddate=20150122
> ddate=2015-01-23
> ddate=NOT_DATE
> Time taken: 0.844 seconds, Fetched: 4 row(s)
> hive> show partitions HIVE5700_STRING_PARTED;
> OK
> ddate=20150121
> ddate=20150122
> ddate=2015-01-23
> ddate=NOT_DATE
> Time taken: 0.114 seconds, Fetched: 4 row(s)
> hive> select * from HIVE5700_STRING_PARTED;
> OK
> some data! 2015-01-23
> some data! 2015-01-23
> some data! 2015-01-23
> some data! 2015-01-23
> Time taken: 0.536 seconds, Fetched: 4 row(s)
> hive> select * from HIVE5700_DATE_PARTED;
> OK
> some data! 2015-01-23
> some data! 2015-01-23
> some data! 2015-01-23
> some data! 2015-01-23
> Time taken: 0.145 seconds, Fetched: 4 row(s)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)