[ 
https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654533#action_12654533
 ] 

Joydeep Sen Sarma commented on HIVE-126:
----------------------------------------

yes - the code was put in there as a safeguard. the history here is that we 
migrated our current hive warehouse from an older version of the software and 
were worried about not capturing all the older partitions in the new metastore. 
we kind of knew that the code was a hack - but was a pure defensive measure.

couple of comments:
- we should move all metadata logic (including hacks if any :-)) - to the 
metastore server side. otherwise we are creating a different view for Java vs. 
Thrift Clients.
-  yes - +1 on a fsck type command to replace this hack. i would actually like 
to run such a command on our current tables before removing this hack.

the core issue is whether we can make this change without having  a fsck like 
utility in some form (even a custom java program). That would also preserve 
some of the current code for handling this case.

-----

for a command line interface - one might want to check the entire database or 
just a table or even just one partition. other metadata checks will also be 
added over time (for example - do the file types on disk agree with metadata 
records, bucketing information etc). So, here's a strawman proposal for a new 
command:

alter table <DB>[.TABLE [PARTITION-SPEC]] check [TYPE-LIST]

where TYPE by default is 'all' (check for all kinds of errors), but can be 
specified to a specific type. For example - in this case - we can have a type 
called 'partitons' (and then over time we can add other types like 'fileformat' 
etc.). for v1 - we can just drop the type-list altogether.

the check command can produce a list of things that need to be done to fix the 
format (like adding any directories not in the metastore - but in hdfs - to the 
metastore). actually performing of such steps would require a user confirmation 
(y/n).

---
Java interfaces. We have been pretty cavalier with Java interfaces. right now 
most of the Hive public methods (other than the SerDe stuff) is not accessed by 
any codebase outside Hive. So i would say just remove them for now - as we go 
through the code module by module - we can identify those modules that we 
actually want to expose publicly. 





> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what 
> partitions a table contains is loaded by listing the directories in the table 
> directory on HDFS. This is then used to overrule what is in the MetaStore if 
> any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the 
> table contains?
> * It will also be a major hassle (or impossible?) to retrieve this 
> information from HDFS with external tables that have non standard partition 
> names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one 
> partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to