[
https://issues.apache.org/jira/browse/HCATALOG-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Travis Crawford updated HCATALOG-341:
-------------------------------------
Attachment: HCATALOG-341.patch
After looking into this a bit more, I realized my initial understanding was
incomplete. In addition to db/table/filter, properties can also be set in
{{InputJobInfo}} that need to be preserved.
This patch mainly clarifies the interface and improves the documentation,
rather than changing it like before, hopefully saving others time.
One interface change is marking {{getInputJobInfo}} private. I did this because
how to load from the job configuration is documented, I can't think of why it
needs to be public, and existing tests do not need it public. I can change back
if someone thinks it should be public.
I went back & forth on making a new {{InputJobInfo}} in {{setLocation}} instead
of just passing the given one through. In practice this is a really cheap copy,
clarifies exactly how {{InputJobInfo}} is used, and prevents "extra stuff" from
sneaking along that the user set. I can change this back to just passing
through but think the clarification about what's going on is worth it.
> InitializeInput improvements
> ----------------------------
>
> Key: HCATALOG-341
> URL: https://issues.apache.org/jira/browse/HCATALOG-341
> Project: HCatalog
> Issue Type: Improvement
> Reporter: Travis Crawford
> Assignee: Travis Crawford
> Attachments: HCATALOG-341.patch
>
>
> This came up in HCATALOG-328.
> {{InitializeInput}} is the HCatalog class that queries the HiveMetaStore and
> stores the query result. It could be improved in the following ways:
> * The class has entirely static methods, so a private arg-less constructor
> should be added to prevent people from accidentally creating instances.
> * Instead of querying the HiveMetaStore each time info is requested, the
> results should be cached after the first query using a key of db+table+filter.
> * {{setInput}} and {{getSerializedHcatKeyJobInfo}} require an existing
> {{InputJobInfo}} argument, however, the point of calling those methods is to
> populate a {{InputJobInfo}} with info from the metastore. While this reduces
> the number of arguments (instead of needing database name, table name,
> partition filter) it confuses the user because its not clear only
> db/table/filter should be set when passed as an argument.
> * {{getSerializedHcatKeyJobInfo}} should be renamed {{getInputJobInfo}} and
> return an unserialized {{InputJobInfo}}. This avoids unnecessary
> serialization/deserialization in the front-end when its not necessary to read
> from the job configuration.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira