Yeah. Created https://issues.apache.org/jira/browse/HCATALOG-150
Ashutosh On Tue, Nov 1, 2011 at 10:42, Thomas Weise <[email protected]> wrote: > We should fix the documentation then? > > http://incubator.apache.org/hcatalog/docs/r0.2.0/inputoutput.html > > > > On 11/1/11 9:13 AM, "Ashutosh Chauhan" <[email protected]> wrote: > > Hey Charles, > > After you have done HCatOutputFormat.setOutput(), you can do > HCatOutputFormat.getTableSchema() which will return you the schema of table > which you can then use without requiring you to manually construct the > Schema. > > Hope it helps, > Ashutosh > > On Mon, Oct 31, 2011 at 20:18, Charles Menguy < > [email protected]> wrote: > > Hi Ashutosh, > > Thank you very much for your answer. > > I can certainly understand your argument. Is there however a way to get > the schema from the output table, so we could potentially create a > dynamic mapping of fields you want to write to and the actual schema? If > not, is there any standard way to be able to accomplish what I described, > other than hardcoding the positions of the columns in the code (bad for > code reusability)? Any alternative would be helpful as well. > > Thanks in advance ! > > Charles > > On Mon, Oct 31, 2011 at 8:37 PM, Ashutosh Chauhan <[email protected]> > wrote: > > Hey Charles, > > Yeah, you need to call setOutputSchema() on HCatOutputFormat explicitly. > Though we could assume defaults we don't because of the following reason. > While writing rows they may either contain partition columns or they may > not. HCatOutputFormat will transparently weed out partition columns if they > are present in the row. If we assume defaults then we have to assume that > data does not contain partition columns (we dont store partition columns in > data) which is a dangerous assumption to make which will screw things up > when we read back. So, instead we ask user to set the schema. You are also > correct order of columns should be same as the one you have declared while > creating tables. > > Hope it helps, > Ashutosh > > > On Mon, Oct 31, 2011 at 14:54, Charles Menguy < > [email protected]> wrote: > > Hi, > > I've been playing with HCatalog for the past couple weeks now, and I have > a few questions regarding schemas in MR jobs. > > From what I read in the documentation, schemas are optional, and if not > specified it defaults to the table level schemas. Here are some extracts > from the documentation: > You can use the setOutputSchema method to include a projection schema, to > specify specific output fields. If a schema is not specified, this default > to the table level schema. > The schema for the data being written out is specified by the setSchema > method. > If this is not called on the HCatOutputFormat, then by default it is > assumed that the the partition has the same schema as the current table > level schema > > Now when I try to omit the schema for HCatInputFormat, it works fine and > assumes the default. > But when I try to omit the schema for HCatOutputFormat, I get the > following error: org.apache.hcatalog.common.HCatException : 9001 : > Exception occurred while processing HCat request : It seems that > setSchema() is not called on HCatOutputFormat. Please make sure that method > is called. > From what I read, it expects that I explicitely define the schema with > HCatOutputFormat.setSchema(...), but this is exactly what I would like to > omit to assume defaults. > > This is actually important because it seems that to define the schema, you > have to know the order of your table columns in which you specify your > List<HCatFieldSchema>, which may not always be obvious. > > Here is how I create my output table in Hive, which works fine when I'm > manipulating it while specifying the schema: > hive> create table inventory(word STRING, author STRING, frequency INT) > stored as RCFILE; > > I would like to know if I'm doing something wrong, or if this is simply > something not yet implemented in 0.2? Any thoughts would be useful. > > Thanks, > > Charles > > > > > > >
