[
https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654533#action_12654533
]
Joydeep Sen Sarma commented on HIVE-126:
----------------------------------------
yes - the code was put in there as a safeguard. the history here is that we
migrated our current hive warehouse from an older version of the software and
were worried about not capturing all the older partitions in the new metastore.
we kind of knew that the code was a hack - but was a pure defensive measure.
couple of comments:
- we should move all metadata logic (including hacks if any :-)) - to the
metastore server side. otherwise we are creating a different view for Java vs.
Thrift Clients.
- yes - +1 on a fsck type command to replace this hack. i would actually like
to run such a command on our current tables before removing this hack.
the core issue is whether we can make this change without having a fsck like
utility in some form (even a custom java program). That would also preserve
some of the current code for handling this case.
-----
for a command line interface - one might want to check the entire database or
just a table or even just one partition. other metadata checks will also be
added over time (for example - do the file types on disk agree with metadata
records, bucketing information etc). So, here's a strawman proposal for a new
command:
alter table <DB>[.TABLE [PARTITION-SPEC]] check [TYPE-LIST]
where TYPE by default is 'all' (check for all kinds of errors), but can be
specified to a specific type. For example - in this case - we can have a type
called 'partitons' (and then over time we can add other types like 'fileformat'
etc.). for v1 - we can just drop the type-list altogether.
the check command can produce a list of things that need to be done to fix the
format (like adding any directories not in the metastore - but in hdfs - to the
metastore). actually performing of such steps would require a user confirmation
(y/n).
---
Java interfaces. We have been pretty cavalier with Java interfaces. right now
most of the Hive public methods (other than the SerDe stuff) is not accessed by
any codebase outside Hive. So i would say just remove them for now - as we go
through the code module by module - we can identify those modules that we
actually want to expose publicly.
> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
> Key: HIVE-126
> URL: https://issues.apache.org/jira/browse/HIVE-126
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 0.19.0
> Reporter: Johan Oskarsson
> Assignee: Johan Oskarsson
> Fix For: 0.19.0
>
> Attachments: HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what
> partitions a table contains is loaded by listing the directories in the table
> directory on HDFS. This is then used to overrule what is in the MetaStore if
> any difference is found.
> * Would it not be preferable if MetaStore is the one authority on what the
> table contains?
> * It will also be a major hassle (or impossible?) to retrieve this
> information from HDFS with external tables that have non standard partition
> names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one
> partition value and "portugal" is another.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.