Re: Thoughts on new metastore APIs

Thejas Nair Thu, 06 Mar 2014 10:13:52 -0800

Thanks for discussing this, Brock. I agree that is important to
consider while writing a new metastore api call.
But I think this (single input/output struct)  should be a guideline,
I am not sure if this should be used in every case.

What you are saying shows that there is a tradeoff between ending up
with more functions vs ending up with more input/output
structs/classes. I am not sure if having more input/output structs is
any better.

Take the case of create_table/create_table_with_environment_context
that you mentioned. Even though create_table had a single input
argument Table, instead of adding  EnvironmentContext contents to
Table struct, the authors decided to create a new function with
additional  EnvironmentContext argument. This makes sense because the
Table struct is used by other functions as well such as get_table, and
EnvironmentContext fields don't make sense for those cases as it is
not persisted as part of table.

Which means that the only way to prevent creation of
create_table_with_environment_context method would have been to have a
CreateTableArg struct as input argument instead of Table as the
argument. ie, creating a different struct for the single input/output
every function is only way you can be sure that you don't need more
functions.

This approach of reducing the number of functions also means that you
would start encoding different types of actions within the single
input argument.

Consider the case of get_partition vs get_partition_by_name. It would
need a single struct with an enum that tells if it to lookup based on
the partition key-values or the name, and based on the enum it would
use different fields in the struct. I feel having different functions
is more readable for this case.

For example, the api in HIVE-5931 would need to change from
====================================
list<RolePrincipalGrant> get_principals_in_role(1:string role_name)
====================================
to

====================================
struct GetPrincipalInRoleOutput{
 1:list<RolePrincipalGrant> rolePrincList;
}

struct GetPrincipalInRoleInput{
1:string role_name;
}
GetPrincipalInRoleOutput get_principals_in_role(1:
GetPrincipalInRoleInput input);
====================================

I am not sure if the insurance costs in terms of readability is low
here. I think we should consider the risk in each case of function
proliferation and pay the costs accordingly.
Let me know if I have misunderstood what you are proposing here.

Thanks,
Thejas

On Wed, Mar 5, 2014 at 11:39 AM, Brock Noland <br...@cloudera.com> wrote:
> Hi,
>
>
> There is a ton of great work going into the 0.13 release.
> Specifically, we are adding a ton of APIs to the metastore:
>
> https://github.com/apache/hive/blame/trunk/metastore/if/hive_metastore.thrift
>
> Few of these new API's follow the best practice of a single request
> and response struct. Some follow this partially by having a single
> response object but take no arguments while others return void and
> take a single request object.  Still others, mostly related to
> authorization, do not even partially follow this pattern.
>
> The single request/response struct model is extremely important as
> changing the number of arguments is a backwards incompatible change.
> Therefore the only way to change an api is to add *new* methods calls.
> This is why we have so many crazy APIs in the hive metastore such as
> create_table/create_table_with_environment_context and 12 (yes,
> twelve) ways to get partitions.
>
> I would like to suggest that we require all new APIs to follow the
> single request/response struct model. That is any new API that would
> be committed *after* today.
>
> I have heard the following arguments against this approach which I
> believe to be invalid:
>
> *This API will never change (or never return a value or never take
> another value)*
> We all have been writing code enough that we don't know, there are
> unknown unknowns. By following the single request/response struct
> model for *all* APIs we can future proof ourselves. Why wouldn't we
> want to buy insurance now when it's cheap?
>
> *The performance impact of wrapping an object is too much*
> These calls are being made over the network which is orders of
> magnitude slower than creating a small, simple, and lightweight object
> to wrap method arguments and response values.
>
> Cheers,
> Brock

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Thoughts on new metastore APIs

Reply via email to