HCatOutputFormat schema issues

Charles Menguy Mon, 31 Oct 2011 14:55:23 -0700

Hi,

I've been playing with HCatalog for the past couple weeks now, and I have a
few questions regarding schemas in MR jobs.


>From what I read in the documentation, schemas are optional, and if not
specified it defaults to the table level schemas. Here are some extracts
from the documentation:
You can use the setOutputSchema method to include a projection schema, to
specify specific output fields. If a schema is not specified, this default
to the table level schema.
The schema for the data being written out is specified by the setSchema method.
If this is not called on the HCatOutputFormat, then by default it is
assumed that the the partition has the same schema as the current table
level schema

Now when I try to omit the schema for HCatInputFormat, it works fine and
assumes the default.
But when I try to omit the schema for HCatOutputFormat, I get the following
error: org.apache.hcatalog.common.HCatException : 9001 : Exception occurred
while processing HCat request : It seems that setSchema() is not called on
HCatOutputFormat. Please make sure that method is called.
>From what I read, it expects that I explicitely define the schema with
HCatOutputFormat.setSchema(...), but this is exactly what I would like to
omit to assume defaults.

This is actually important because it seems that to define the schema, you
have to know the order of your table columns in which you specify your
List<HCatFieldSchema>, which may not always be obvious.

Here is how I create my output table in Hive, which works fine when I'm
manipulating it while specifying the schema:
hive> create table inventory(word STRING, author STRING, frequency INT)
stored as RCFILE;

I would like to know if I'm doing something wrong, or if this is simply
something not yet implemented in 0.2? Any thoughts would be useful.

Thanks,

Charles

HCatOutputFormat schema issues

Reply via email to